| Literature DB >> 29426939 |
Gaurav Sablok1, Regan J Hayward2, Peter A Davey2, Rosiane P Santos3, Martin Schliep2, Anthony Larkum2, Mathieu Pernice2, Rudy Dolferus4, Peter J Ralph5.
Abstract
Seagrasses and aquatic plants are important clades of higher plants, significant for carbon sequestration and marine ecological restoration. They are valuable in the sense that they allow us to understand how plants have developed traits to adapt to high salinity and photosynthetically challenged environments. Here, we present a large-scale phylogenetically profiled transcriptomics repository covering seagrasses and aquatic plants. SeagrassDB encompasses a total of 1,052,262 unigenes with a minimum and maximum contig length of 8,831 bp and 16,705 bp respectively. SeagrassDB provides access to 34,455 transcription factors, 470,568 PFAM domains, 382,528 prosite models and 482,121 InterPro domains across 9 species. SeagrassDB allows for the comparative gene mining using BLAST-based approaches and subsequent unigenes sequence retrieval with associated features such as expression (FPKM values), gene ontologies, functional assignments, family level classification, Interpro domains, KEGG orthology (KO), transcription factors and prosite information. SeagrassDB is available to the scientific community for exploring the functional genic landscape of seagrass and aquatic plants at: http://115.146.91.129/index.php .Entities:
Mesh:
Year: 2018 PMID: 29426939 PMCID: PMC5807536 DOI: 10.1038/s41598-017-18782-0
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Summary statistics of transcriptomics in SeagrassDB.
| Summary Statistics | SI | HU | LM | HO | CS | PI | PO | ZA | ZM |
|---|---|---|---|---|---|---|---|---|---|
| Total number of reads (PE) | 30800346 | 39950720 | 37793836 | 42671860 | 41836870 | 43133914 | 70453120 | 55525824 | 60812923 |
| Total number of Unigenes | 94218 | 57490 | 169790 | 141858 | 112178 | 51707 | 79235 | 52741 | 293045 |
| Median length (bp) | 408 | 624 | 388 | 360 | 429 | 577 | 853 | 528 | 366 |
| Maximum contig length (bp) | 15898 | 14423 | 12316 | 8831 | 12258 | 12507 | 16705 | 15776 | 26925 |
| N50 (bp) | 1157 | 1741 | 938 | 724 | 1528 | 1836 | 2041 | 1672 | 1171 |
| Number of contigs (>1 kb) | 18721 | 21223 | 28134 | 19068 | 27509 | 18336 | 35285 | 16905 | 52326 |
| Number of predicted ORFs | 53254 | 33310 | 79652 | 66706 | 57517 | 27819 | 34245 | 24824 | 130627 |
| Unigenes with BLASTx against UniprotKB | 39965 | 36181 | 64552 | 79240 | 55494 | 32540 | 38849 | 31450 | 121446 |
| Unigenes with PFAM | 37192 | 32745 | 61879 | 75022 | 51916 | 29777 | 37467 | 30146 | 114424 |
| Unigenes with GO | 37036 | 32734 | 61343 | 75039 | 51523 | 29572 | 38389 | 30860 | 113401 |
| Unigenes with InterPro | 38232 | 34127 | 63042 | 76553 | 53439 | 30932 | 38062 | 30643 | 117091 |
| Unigenes with Prosite | 28570 | 22320 | 51819 | 65065 | 39200 | 21111 | 33130 | 26831 | 94482 |
| Unigenes with TF | 3045 | 3161 | 4444 | 3500 | 3652 | 2722 | 3033 | 2528 | 8370 |
Species name corresponds to Cymodocea serrulata (CS), Halodule uninervis (HU), Halophila ovalis (HO), Lemna minor (LM), Phyllospadix iwatensis (PI), Syringodium isoetifolium (SI), Zostera muelleri (ZM), Zostera marina (ZA) and Posidonia oceanica (PO).
Figure 1Contig binning across the assembled species in SeagrassDB.
BUSCO assessment of transcriptome completeness in SeagrassDB.
| SI | HU | LM | HO | CS | PI | PO | ZA | ZM | |
|---|---|---|---|---|---|---|---|---|---|
| Complete BUSCOs | 878 | 935 | 1056 | 742 | 800 | 887 | 1107 | 862 | 1113 |
| Complete and single-copy BUSCOs | 740 | 781 | 859 | 628 | 628 | 757 | 917 | 729 | 759 |
| Complete and duplicated BUSCOs | 138 | 154 | 197 | 114 | 172 | 130 | 190 | 133 | 334 |
| Fragmented BUSCOs | 161 | 148 | 107 | 179 | 153 | 111 | 112 | 139 | 95 |
| Missing BUSCOs | 401 | 357 | 277 | 519 | 487 | 442 | 221 | 439 | 232 |
| Total BUSCO groups searched | 1440 | 1440 | 1440 | 1440 | 1440 | 1440 | 1440 | 1440 | 1440 |
In the case of BUSCO, entire embryophyta datasets were used as a lineage for the assessment of proteome completeness in trans mode of BUSCO (Simão et al.[22]). BUSCO uses a set of the evolutionary informed near-universal single copy orthologs from OrthoDB v9. *Cymodocea serrulata (CS), Halodule uninervis (HU), Halophila ovalis (HO), Lemna minor (LM), Phyllospadix iwatensis (PI), Syringodium isoetifolium (SI), Zostera muelleri (ZM), Zostera marina (ZA) and Posidonia oceanica (PO).
DOGMA based assessment of transcriptome completeness in SeagrassDB.
| CDA size | SI | HU | LM | HO | CS | PI | PO | ZA | ZM |
|---|---|---|---|---|---|---|---|---|---|
| Found | 1804 | 1811 | 1918 | 1473 | 1775 | 1707 | 1876 | 1676 | 1963 |
| Expected | 2017 | 2017 | 2017 | 2017 | 2017 | 2017 | 2017 | 2017 | 2017 |
| Completeness | 89.44 | 89.79 | 95.09 | 73.03 | 88 | 84.63 | 93.01 | 83.09 | 97.32 |
Domain completeness of the assembled transcriptome was assessed using DOGMA version 2.00 (Dohmen et al.[23]) based on 965 single-domain CDAs (Conserved Domain Arrangements) and 1,052 multiple-domain CDAs across eukaryotes. DOGMA uses a set of the PFAM modeled evolutionary conserved set of the conserved protein domains. CDA Size: The size of the CDAs that were found to be conserved in the core species; Found: The number of these CDAs that were found; Expected: The number of expected CDAs (=all CDAs that were found to be conserved among the core species); %Completeness: Number of CDAs found (in percent). *Cymodocea serrulata (CS), Halodule uninervis (HU), Halophila ovalis (HO), Lemna minor (LM), Phyllospadix iwatensis (PI), Syringodium isoetifolium (SI), Zostera muelleri (ZM), Zostera marina (ZA) and Posidonia oceanica (PO).
Figure 2(a) Venn diagram using VennPainter available from https://github.com/linguoliang/VennPainter shows the shared single copy orthologs across aquatic plant species; (b) showing the shared single copy orthologs across the Cymodoceaceae, Araceae and Hydrocharitaceae; and (c) showing the shared single copy orthologs across the Zosteracea, Posidoniceae and Araceae.
Figure 3Browsing SeagrassDB.
Figure 4(a) Shows the protein alignment of H+-ATPase; (b) and (c) shows the structural conservation of H+-ATPases across the land and aquatic plants.
Figure 5Phylogenetic resolution of H+ ATPase across the evolutionary time scale.