| Literature DB >> 34475414 |
Daniel Roush1,2, Ana Giraldo-Silva1,2, Ferran Garcia-Pichel3,4.
Abstract
Cyanobacteria are a widespread and important bacterial phylum, responsible for a significant portion of global carbon and nitrogen fixation. Unfortunately, reliable and accurate automated classification of cyanobacterial 16S rRNA gene sequences is muddled by conflicting systematic frameworks, inconsistent taxonomic definitions (including the phylum itself), and database errors. To address this, we introduce Cydrasil 3 ( https://www.cydrasil.org ), a curated 16S rRNA gene reference package, database, and web application designed to provide a full phylogenetic perspective for cyanobacterial systematics and routine identification. Cydrasil 3 contains over 1300 manually curated sequences longer than 1100 base pairs and can be used for phylogenetic placement or as a reference sequence set for de novo phylogenetic reconstructions. The web application (utilizing PaPaRA and EPA-ng) can place thousands of sequences into the reference tree and has detailed instructions on how to analyze results. While the Cydrasil web application offers no taxonomic assignments, it instead provides phylogenetic placement, as well as a searchable database with curation notes and metadata, and a mechanism for community feedback.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34475414 PMCID: PMC8413452 DOI: 10.1038/s41597-021-01015-5
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Process flow diagram describing Cydrasil database construction and curation for the version 3 release. Yellow shapes indicate final reference package files. Chart symbols follow American National Standards Institute (ANSI)/International Organization for Standardization (ISO) standards.
Summary statistics for major releases of Cydrasil.
| Cydrasil Version | Number of Sequences | ||||
|---|---|---|---|---|---|
| Total | Cyanobacteria | Outgroup | Plastids | Sibling clades | |
| 1 (rc1) | 982 | 980 | 0 | 0 | 2 (root) |
| 1.5 | 1494 | 1481 | 3 | 6 | 4 |
| 2 | 1482 | 1405 | 3 | 6 | 68 |
| 3 | |||||
1The source distribution of sequences are as follows: 971 NCBI and 356 IMG/JGI.
2Outgroup sequences were removed in version 3 as they are no longer needed. The WOR1 sibling clade is now used as root.
JSON keys, and description for cydrasil-v3.json. All entries are strings.
| Description | |
|---|---|
| New v3 sequence ID in CY-dataSource-dataSourceID format | |
| New v3 taxonomic ID in the CY-sourceName-sourceDatabaseID#g__generaName.s__speciesName.str__strainName | |
| Database, publication, or lab where the sequence was retrieved. | |
| A link to the entry in the corresponding dataSource or contact information for the submitting lab. | |
| The id number of the sequence, initially assigned alphabetically. | |
| Contains notes about sequences including other names if strains are identical, or if the organism is part of an outgroup. | |
| Name corresponding to the sequence in Cydrasil version 2 and earlier. | |
| The DNA sequence corresponding to the 16S rRNA gene with no masking. | |
| This is reserved for warnings regarding sequence quality, taxonomic naming errors, or other oddities. |
| Measurement(s) | Phylogenetic Analysis • computational phylogenetic analysis • Data Quality |
| Technology Type(s) | Maximum Likelihood Estimation • digital curation |
| Sample Characteristic - Organism | Cyanobacteria/Melainabacteria group |