Literature DB >> 30020436

realDB: a genome and transcriptome resource for the red algae (phylum Rhodophyta).

Fei Chen1, Jiawei Zhang1, Junhao Chen1,2, Xiaojiang Li1, Wei Dong1, Jian Hu1, Meigui Lin1, Yanhui Liu1, Guowei Li3,4, Zhengjia Wang2, Liangsheng Zhang1.   

Abstract

With over 6000 species in seven classes, red algae (Rhodophyta) have diverse economic, ecological, experimental and evolutionary values. However, red algae are usually absent or rare in comparative analyses because genomic information of this phylum is often under-represented in various comprehensive genome databases. To improve the accessibility to the ome data and omics tools for red algae, we provided 10 genomes and 27 transcriptomes representing all seven classes of Rhodophyta. Three genomes and 18 transcriptomes were de novo assembled and annotated in this project. User-friendly BLAST suit, Jbrowse tools and search system were developed for online analyses. Detailed introductions to red algae taxonomy and the sequencing status are also provided. In conclusion, realDB (realDB.algaegenome.org) provides a platform covering the most genome and transcriptome data for red algae and a suite of tools for online analyses, and will attract both red algal biologists and those working on plant ecology, evolution and development.Database URL: http://realdb.algaegenome.org/.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 30020436      PMCID: PMC6051438          DOI: 10.1093/database/bay072

Source DB:  PubMed          Journal:  Database (Oxford)        ISSN: 1758-0463            Impact factor:   3.451


Introduction

Red algae (phylum Rhodophyta) have various values in our daily life. They are important sources of food, such as nori used in sushi and pudding made of Irish moss. The high content of vitamins and proteins of red algae-derived foods has made them attractive and popular in east Asia for >1000 years (1). Red algae have valuable ecological roles, such as producing oxygen in the seawater while some species are important in the formation of tropical reefs. In many Pacific atolls, red algae have contributed far more to reef structure than other organisms including corals (2). In the oceans, various species of red algae are primary producers eaten by fish, crustaceans, worms and gastropods. Red algae occupy the second basal branch in the green lineage following the Glaucophyta algae (3). Some red algal species have important evolutionary value for studying basic biological questions such as the origin of multi-cellularity (4), symbiosis (5) and evolution of photosynthesis. There are about 6000 species of red algae [Source: AlgaeBase (6), www.algaebase.org], ranging from single-celled species to complex, multi-cellular, ‘plant-like’ organisms. They are also excellent material to study symbiosis, since many are inexorably associated with other organisms. Some species are used to produce agars, which are gelatinous food additives and in science labs as a support substance in culture media (7). The current available red algae related data, such as those included in AlgaeBase (www.algaebase.org) and Porphyra website (http://www.porphyra.org/), are limited to morphological descriptions. The integration of genome data and morphological data is in its beginning stage. For instance, the comprehensive database phytozome V12 (phytozome.jgi.doe.gov/pz/portal.html), plant genome duplication database (PGDD, chibba.agtec.uga.edu/duplication), plant genome database (PlantGDB, plantgdb.org) (release V187) and plant genome and systems biology (PGSB, pgsb.helmholtz-muenchen.de/plant) database do not include any red algae genome. The pico-Plaza 2.0 (bioinformatics.psb.ugent.be/plaza/versions/pico-plaza/) database include one red algal genome, while CoGe database has two and Ensembl Plant database (plants.ensembl.org) has three (Figure 1). This dearth of information leads to the underestimation of the biological importance of red algae. Comparative analysis of red algae species, such as the evolutionary studies of genes families (8, 9), non-coding genes and small RNAs (10) lag far behind in plant science, partly because of the difficulty in obtaining red algae genomic information. In breeding, an open platform integrating various omics data and species information is the demand for scientists and breeders (11).
Figure 1.

Genomes and transcriptomes included in realDB showing the comparison of datasets among several leading comprehensive databases. Phytozome V12, PGDD, PlantGDB, and PGSB have not included any red algal genome. Pico-Plaza 2.9, CoGe and Ensembl Plant each contains 1, 2, and 3 red algal genomes, respectively. In comparison, realDB now has 10 red algal genomes and 27 transcriptomes.

Genomes and transcriptomes included in realDB showing the comparison of datasets among several leading comprehensive databases. Phytozome V12, PGDD, PlantGDB, and PGSB have not included any red algal genome. Pico-Plaza 2.9, CoGe and Ensembl Plant each contains 1, 2, and 3 red algal genomes, respectively. In comparison, realDB now has 10 red algal genomes and 27 transcriptomes. To meet the ever-rapid increasing amounts of genomic and transcriptome data and their tremendous potential in understanding developmental process (12) and to assist molecular breeding, we build an online, searchable platform for integrating the ome data with the use of multiple omics tools. The Information gained will be a valuable to boost the understanding red algae genomes and the evolution of plant genomes.

Data description

Dataset

Sequences from six genomes (including partial genomes) and seven transcriptomes, together with annotation data were downloaded directly from public available websites. These data were shared freely on these websites, which provided no or little online analysis tools. Raw reads of four genomes and 20 transcriptomes were downloaded from NCBI-SRA (www.ncbi.nlm.nih.gov/sra) database without annotations, thus were de novo assembled and annotated in this study (Supplementary Table S1). The red algal transcription factors were predicted in this study, relying on the HMMsearch tool from the HMMER software (hmmer.org) with default parameters and homology seeds from Pfam database (pfam.xfam.org).

Assembly and annotation of genomes and transcriptomes

All the original reads from the downloaded raw data were filtered using Trimmomatic (13) (https://github.com/timflutre/trimmomatic) to remove the adapters and low-quality reads. These clean reads were then de novo assembled using the software Trinity (14) (https://github.com/trinityrnaseq/trinityrnaseq). Trinity produced the transcriptome files in FASTA format and the assembled sequences were then used for gene identification. TransDecoder was integrated in Trinity software and was employed for detecting gene regions (https://github.com/TransDecoder/TransDecoder/). Kyoto Encyclopedia of Genes Genomes (KEGG) and Enzyme Commission data were both obtained by BLAST genes with the KEGG database (https://www.kegg.jp/kegg/).

Database construction

The realDB database employs Aliyun, one of the largest cloud server providers in the world, thus facilitates realDB outstanding advantages such as (i) scalability in easily expanding its storage size and computing ability, (ii) more stability and (iii) simple to maintain. The realDB relies on the Linux Ubuntu Server 14.04.4, Apache2.4.18, Java (version 1.8) and Java Server Page (JSP) 2.0. realDB provides an efficient and friendly interface for users to access a multitude of red algae data, which displays a simple and direct homepage. The searching system was created using PHP 7.0.22 and MySQL 5.7.20 software.

Results and discussions

An updating timeline for the sequenced red algae

To attract more visits to our online platform, we created an updating timeline system on the homepage of realDB that updates the recently sequenced genome or transcriptome of red algal species (Figure 2). This timeline system consists of >1000 lines of code adapted from vis.js (http://visjs.org/), dedicated to providing multiple forms of information, including the release time, genome size, reference and authors. User can click the hyperlink to browse the reference or related linked websites for additional information. The defining feature of this timeline tool is its dynamics and interactive features with species information. Users can move the timeline space and zoom in or zoom out of the timeline by dragging and scrolling in the species timeline zone. The time-scale on the axis is adjusted automatically (http://almende.github.io/chap-links-library/graph.html), supporting scales ranging from milliseconds to years. We will create new items when genome or transcriptome from other red algal species become available.
Figure 2.

The snapshot of realDB homepage. The head part of realDB consists of two parts: the menu and the Jumbotron. A timeline was created for displaying the updates of red algal genomes and transcriptomes, together with related introduction to the sequencing of each species. realDB introduction, database news, highlights and statistics of global visits.

The snapshot of realDB homepage. The head part of realDB consists of two parts: the menu and the Jumbotron. A timeline was created for displaying the updates of red algal genomes and transcriptomes, together with related introduction to the sequencing of each species. realDB introduction, database news, highlights and statistics of global visits.

Bootstrap boosted framework for various display facilities

Users are able to check our genome updates and news via mobile phone using the Bootstrap framework, which is the world’s leading framework for building responsive, mobile-first sites (15). Users are able to check updates of genome releases or website news of realDB via mobile, ipad, laptop and desktop using all popular web browsers including Google Chrome, Safari, Firefox, Internet Explorer, etc. without any display difficulty (Figure 2).

Various introductions to red algae for wide readership

The lack of red algal genome sequences in various databases is partly due to the limited knowledge of red algae. The molecular biological studies of red algae provide many useful results, such as information on systematics, physiology, ecology and evolution. Concise introductions to each species assist visitors with different backgrounds to quickly decide which species to analyze. Most of the information for each species was provided and cited from the book ‘Red Algae in the Genomic Age’ (16), including descriptions of life histories, forms and styles, genomic information and data sources. We also provided the description and classification of red algae on the website because general researchers and comparative genomic biologists usually do not have extensive knowledge of red algae classification or morphology.

realDB covers the largest number of red algae with ome data

The current realDB V1.0 gathered 10 available genomes and 27 transcriptomes, representing all the 7 classes in the Rhodophyta. Among this dataset, we de novo assembled the genomes of Galdieria phlegrea, Gracilariopsis lemaneiformis and Porphyridium cruentum, and 18 transcriptomes (Table 1). realDB has provided 37 ome datasets, including genome and transcriptome sequences (Figure 2). In comparison, the green lineage oriented genome database Phytozome (phytozome.jgi.doe.gov) does not contain any genome/transcriptome data of red algae. Furthermore, the algae-oriented database Pico-PLAZA (bioinformatics.psb.ugent.be/plaza/versions/pico-plaza) harbors only one red algal genome, and the plant-specific database Ensemble Plant (plants.ensembl.org) has included only three red algal genomes (Figure 2). In realDB, we selected Chondrus crispus, Cyanidioschyzon merolae, Galdieria sulphuraria, G. phlegrea as flagship red algae with the best genome sequencing and assembly.
Table 1.

The assembly and annotation of red algal genomes and transcriptomes in realDB

SpeciesData typeRead sizeContig numberAssembled size (Mb)Gene modelsN50Sequencing platform
Ahnfeltiopsis flabelliformisTranscriptome1.5 Gb22 18332.618 9332748Illumina HiSeq 2000
Betaphycus philippinensisTranscriptome1.8 Gb23 27928.815 9482361Illumina HiSeq 2000
Ceramium kondoiTranscriptome931.1 Mb23 12621.418 4021385Illumina HiSeq 2000
Dumontia simplexTranscriptome1.5 Gb18 91022.515 5722048Illumina HiSeq 2000
Eucheuma denticulatumTranscriptome1.7 Gb24 65627.915 4782020Illumina HiSeq 2000
Gloiopeltis furcataTranscriptome1.3 Gb24 86025.918 3591594Illumina HiSeq 2000
Gracilaria blodgettiiTranscriptome735.2 Mb19 69122.515 5632109Illumina HiSeq 2000
Gracilaria chouaeTranscriptome1.4 Gb14 59725.816 4382904Illumina HiSeq 2000
Gracilaria vermiculophyllaTranscriptome2 Gb13 44425.215 6633645Illumina HiSeq 2000
Grateloupia catenataTranscriptome1.6 Gb27 1572918 1902015Illumina HiSeq 2000
Grateloupia filicinaTranscriptome1.5 Gb49 58738.625 6961341Illumina HiSeq 2000
Grateloupia lividaTranscriptome1.3 Gb14 93422.214 1312440Illumina HiSeq 2000
Grateloupia turuturuTranscriptome1.4 Gb15 73925.515 6392591Illumina HiSeq 2000
Heterosiphonia pulchraTranscriptome1.5 Gb33 22528.619 1831594Illumina HiSeq 2000
Mazzaella japonicaTranscriptome1.4 Gb25 2642716 9901981Illumina HiSeq 2000
Neosiphonia japonicaTranscriptome1.3 Gb25 34721.816 1271204Illumina HiSeq 2000
Porphyra purpurea Transcriptome869.9 Mb20 32324.8655 4531121454 GS FLX
Compsopogon coeruleusTranscriptome1015.8 Mb11 71815.868442639Illumina HiSeq 2000
Erythrolobus madagascarensisTranscriptome732.3 Mb14 09914.591521433Illumina HiSeq 2000
Erythrolobus australicusTranscriptome582.5 Mb14 22715.411 8571533Illumina HiSeq 2000
Kappaphycus alvareziiTranscriptome1.9 Gb34 09540.820 2531550Illumina HiSeq 2000
Madagascaria erythrocladiodesTranscriptome1.6 Gb51 99948.939 9311041Illumina HiSeq 2000
Porphyridium aerugineumTranscriptome1.2 Gb17 5021811 1321450Illumina HiSeq 2000
Rhodosorus marinusTranscriptome1 Gb29 36459.830 0112092Illumina HiSeq 2000
Rhodella maculataTranscriptome1.5 Gb20 89019.215 3981434Illumina HiSeq 2000
Timspurckia oligopyrenoidesTranscriptome1.5 Gb10 33716.378262179Illumina HiSeq 2000
Symphyocladia latiusculaTranscriptome939.5 Mb32 9662217 377765Illumina HiSeq 2000
Galdieria phlegreaGenome161 Mb11 55913.710 3031467454 GS FLX titanium
Porphyridium cruentumGenome1.7 Gb732129.317 0059536Illumina genome analyzer Iix
Gracilaria lemaneiformisGenome2.8 Gb179 736184151 728921Illumina MiSeq
Calliarthron tuberculosumGenome1.6 Gb119 43099.728 266718454 GS FLX titanium
Chondrus crispusGenome1.7 Gb925104.89606240Sanger technology
Cyanidioschyzon merolaeGenome1.8 Gb2015.95331859 119whole genome random sequencing
Galdieria sulphurariaGenome60 Mb117127174134 001ONT MinION
Porphyra umbilicalisGenome558.41 Gb212685.114 399202 021PacBio RS
Porphyridium purpureumGenome7 Gb301419.45773020 534Illumina GAIIx
Pyropia yezoensisGenome1.9 Gb44 63442.710 3271669Illumina genome analyzer Iix
The assembly and annotation of red algal genomes and transcriptomes in realDB

A suit of toolbox for online analysis

Besides the downloadable dataset, online tools would facilitate data retrieval and comparative analyses. Currently, realDB provides a complete suite of BLAST tools (Figure 3) consisting of BLASTn, BLASTx, BLASTp, tBLASTn and tBLASTx. This BLAST suit was constructed using the sequenceserver tool (www.sequenceserver.com/). A list of 21 advanced parameters such as -evalue 1.0e-10 -max_target_seqs 10 are optional for searches. For the nucleotides, coding sequences (CDS) and genomes were separated and could be individually selected. Users will find the GenBank style formatted BLAST results easy to use and download hits in FASTA format, and align data in tab-delimited or XML formats.
Figure 3.

The BLAST search provided by realDB. (A) Users can search any combination of datasets by clicking on each red algal species. (B) An example of the search result. Users could download the hits in FASTA format, alignment data in tab-delimited and XML format for further analysis.

The BLAST search provided by realDB. (A) Users can search any combination of datasets by clicking on each red algal species. (B) An example of the search result. Users could download the hits in FASTA format, alignment data in tab-delimited and XML format for further analysis. JBrowse was incorporated into realDB (Figure 3), allowing users to instantly browse, visualize, and retrieve sequence data. Currently, we provided the Jbrowse tool for C. crispus, C. merolae, G. sulphuraria, which are well assembled and annotated genomes. Using Jbrowse tool, users can easily browse and analyze these genomes at various scales with a graphic interface. Detailed gene information could be conveniently viewed and fetched by zooming in and out the interested genomic region, to view the information such as location, annotation and sequences by clicking on the corresponding tracks. The search tools in realDB provide a series of search service for CDS, protein, gene annotation, gene family, transcription factors and miRNA information (Figure 4). These information will be useful for both wet lab and dry lab biologists. Gene families, especially transcription factor families, control various physiological processes and are breeding targets (17–27). miRNAs have been extensively studied in land plants and green algae (12, 28, 29). However, little is known about its function and evolutionary trajectory in red algae. We incorporated four miRNA datasets that have been experimentally validated from Porphyridium purpureum (30) (Porphyridiophyaceae), C. crispus (31) (Florideophyaceae), Eucheuma denticulatum (32) (Florideophyaceae), P. yezoensis (33) (Bangiophyaceae) into realDB. Users can easily discover a miRNA and related information through our search system.
Figure 4.

realDB offers a series of tools for online analysis. This menu offers detailed resources and tools integrated in realDB. A snapshot was presented to each menu to help readers quickly catch the related information.

realDB offers a series of tools for online analysis. This menu offers detailed resources and tools integrated in realDB. A snapshot was presented to each menu to help readers quickly catch the related information.

Conclusion and future perspectives

Red algae (Rhodophyta) have a critical place in plant evolution as the second branch after Glaucophyta, attracting thousands of scientists in areas of ecology, evolution and genomics. They are also attractive to people working on bioengineering, medicine and food science. These study of red algae is facing the rapid development of genomics. Facilitated by low-cost and fast sequencing technologies, more and more red algae have their genomes and transcriptomes sequenced. realDB is dedicated to being the leading platform for analyzing red algae genomes by providing the latest omics data and oneline analysis tools. Currently, we provide the most genome and transcriptome data for 37 red algae that are freely available to all researchers. The realDB Version 1.0 database is the first release and will be updated when new datasets are available. Furthermore, we will incorporate additional bioinformatics tools for easier data access and online analyses. Since its release in September 2017, realDB has attracted the attention of scientists from around the world, and the website has been visited by researchers from 27 countries (April 2018). All people interested in realDB are encouraged to contact us for data sharing and collaboration. We are dedicated to collaborating with international teams to collect more data and develop more tools, hoping to make realDB the most influential database for red algae studies.

Supplementary data

Supplementary data are available at Database Online. Click here for additional data file.
  21 in total

1.  One cell, two cell, red cell, blue cell: The persistence of a unicellular stage in multicellular life histories.

Authors:  R K Grosberg; R R Strathmann
Journal:  Trends Ecol Evol       Date:  1998-03       Impact factor: 17.712

2.  Evolutionary history of plant microRNAs.

Authors:  Richard S Taylor; James E Tarver; Simon J Hiscock; Philip C J Donoghue
Journal:  Trends Plant Sci       Date:  2014-01-07       Impact factor: 18.313

Review 3.  New technologies accelerate the exploration of non-coding RNAs in horticultural plants.

Authors:  Degao Liu; Ritesh Mewalal; Rongbin Hu; Gerald A Tuskan; Xiaohan Yang
Journal:  Hortic Res       Date:  2017-07-05       Impact factor: 6.793

4.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

Review 5.  Temporal and spatial control of gene expression in horticultural crops.

Authors:  Manjul Dutt; Sadanand A Dhekney; Leonardo Soriano; Raju Kandel; Jude W Grosser
Journal:  Hortic Res       Date:  2014-09-24       Impact factor: 6.793

6.  Genome and transcriptome analysis of the grapevine (Vitis vinifera L.) WRKY gene family.

Authors:  Min Wang; Alessandro Vannozzi; Gang Wang; Ying-Hai Liang; Giovanni Battista Tornielli; Sara Zenoni; Erika Cavallini; Mario Pezzotti; Zong-Ming Max Cheng
Journal:  Hortic Res       Date:  2014-03-26       Impact factor: 6.793

7.  Genome-wide analysis and expression profile of the bZIP transcription factor gene family in grapevine (Vitis vinifera).

Authors:  Jinyi Liu; Nana Chen; Fei Chen; Bin Cai; Silvia Dal Santo; Giovanni Battista Tornielli; Mario Pezzotti; Zong-Ming Max Cheng
Journal:  BMC Genomics       Date:  2014-04-13       Impact factor: 3.969

8.  Trimmomatic: a flexible trimmer for Illumina sequence data.

Authors:  Anthony M Bolger; Marc Lohse; Bjoern Usadel
Journal:  Bioinformatics       Date:  2014-04-01       Impact factor: 6.937

9.  Identification and Characterization of miRNAs in Chondrus crispus by High-Throughput Sequencing and Bioinformatics Analysis.

Authors:  Fan Gao; FangRu Nan; Wei Song; Jia Feng; JunPing Lv; ShuLian Xie
Journal:  Sci Rep       Date:  2016-05-19       Impact factor: 4.379

10.  Identification of conserved and novel microRNAs in Porphyridium purpureum via deep sequencing and bioinformatics.

Authors:  Fan Gao; Fangru Nan; Jia Feng; Junping Lv; Qi Liu; Shulian Xie
Journal:  BMC Genomics       Date:  2016-08-11       Impact factor: 3.969

View more
  1 in total

1.  MGH: a genome hub for the medicinal plant maca (Lepidium meyenii).

Authors:  Junhao Chen; Jiawei Zhang; Meigui Lin; Wei Dong; Xinyue Guo; Yang Dong; Zhengjia Wang; Liangsheng Zhang; Fei Chen
Journal:  Database (Oxford)       Date:  2018-01-01       Impact factor: 3.451

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.