| Literature DB >> 28440791 |
Vitor Ramos1,2, João Morais1, Vitor M Vasconcelos1,2.
Abstract
The dataset herein described lays the groundwork for an online database of relevant cyanobacterial strains, named CyanoType (http://lege.ciimar.up.pt/cyanotype). It is a database that includes categorized cyanobacterial strains useful for taxonomic, phylogenetic or genomic purposes, with associated information obtained by means of a literature-based curation. The dataset lists 371 strains and represents the first version of the database (CyanoType v.1). Information for each strain includes strain synonymy and/or co-identity, strain categorization, habitat, accession numbers for molecular data, taxonomy and nomenclature notes according to three different classification schemes, hierarchical automatic classification, phylogenetic placement according to a selection of relevant studies (including this), and important bibliographic references. The database will be updated periodically, namely by adding new strains meeting the criteria for inclusion and by revising and adding up-to-date metadata for strains already listed. A global 16S rDNA-based phylogeny is provided in order to assist users when choosing the appropriate strains for their studies.Entities:
Mesh:
Year: 2017 PMID: 28440791 PMCID: PMC5404626 DOI: 10.1038/sdata.2017.54
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Diagram illustrating the workflow followed during the construction and release of the dataset (standard flowchart symbols were used).
Number of cyanobacterial strains included in version 1 of the CyanoType dataset and present in the phylogenetic trees obtained in this study, by category: T, Type strain of the Type species; t, not the type strain, but phylogenetically close-related; R, Reference strain in Bergey's Manual of Systematic Bacteriology[9]; r, not the reference strain, but phylogenetically close-related; G, strain with its genome sequenced and publicly available; E, strain studied from exsiccata.
| T or t, only | 73 | 63 |
| T or t and R or r | 5 | 4 |
| T or t and G | 10 | 10 |
| T or t and R or r and G | 9 | 9 |
| R or r and G | 60 | 60 |
| R or r, only | 41 | 30 |
| G, only | 172 | 155 |
| E | 1 | 1 |
| TOTAL | 371 | 332 |
*see also categories descriptions in the Data Records section.
Figure 2Example of the use of the proposed subset of strains representing the cyanobacterial ‘tree of life’ (see Subset_Condens_Tree in the Data Records section and Phylogenetic analyses in Methods) to evaluate the phylogenetic placement of strains not included in Supplementary Fig. 1 due to having short 16S rRNA gene sequences (in bold).
The evolutionary history was inferred by using the Maximum Likelihood method based on the GTR+G+I model. Bootstrap values indicated near internal branches; values below 50% were omitted. Information for each cyanobacterial strain include accession number of the nucleotide sequence, strain ID, eventual taxonomic synonyms or other strain names (in parentheses), and co-identical strains or other strain codes (in parentheses). Letters after colon indicate the categorization of strains as follows (see also Strain_Category in Data Record section): T, Type strain of the Type species; t, not the type strain, but phylogenetically close-related; R, Reference strain in Bergey's Manual of Systematic Bacteriology[9]; r, not the reference strain, but phylogenetically close-related; G, strain with its genome sequenced and publicly available; E, strain studied from exsiccata. A letter in parentheses means that there is a taxonomic-related uncertainty with the taxon name (see taxonomic comments) or the assigned strain’s category couldn’t be yet fully confirmed (e.g., for provisional species names). The outgroup was pruned from the tree for clarity. The scale bar represents nucleotide substitutions per site.