| Literature DB >> 29106641 |
Terje Klemetsen1, Inge A Raknes1, Juan Fu1, Alexander Agafonov1, Sudhagar V Balasundaram1, Giacomo Tartari1,2, Espen Robertsen1, Nils P Willassen1.
Abstract
We introduce the marine databases; MarRef, MarDB and MarCat (https://mmp.sfb.uit.no/databases/), which are publicly available resources that promote marine research and innovation. These data resources, which have been implemented in the Marine Metagenomics Portal (MMP) (https://mmp.sfb.uit.no/), are collections of richly annotated and manually curated contextual (metadata) and sequence databases representing three tiers of accuracy. While MarRef is a database for completely sequenced marine prokaryotic genomes, which represent a marine prokaryote reference genome database, MarDB includes all incomplete sequenced prokaryotic genomes regardless level of completeness. The last database, MarCat, represents a gene (protein) catalog of uncultivable (and cultivable) marine genes and proteins derived from marine metagenomics samples. The first versions of MarRef and MarDB contain 612 and 3726 records, respectively. Each record is built up of 106 metadata fields including attributes for sampling, sequencing, assembly and annotation in addition to the organism and taxonomic information. Currently, MarCat contains 1227 records with 55 metadata fields. Ontologies and controlled vocabularies are used in the contextual databases to enhance consistency. The user-friendly web interface lets the visitors browse, filter and search in the contextual databases and perform BLAST searches against the corresponding sequence databases. All contextual and sequence databases are freely accessible and downloadable from https://s1.sfb.uit.no/public/mar/.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29106641 PMCID: PMC5753341 DOI: 10.1093/nar/gkx1036
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.General and simplified procedures for construction of the MAR databases. The top part represents the flow of contextual data records from its collection to implementation on the web server. The bottom part illustrates how sequence data becomes implemented and processed. Only metagenomic sequences in relation with MarCat has been processed using META-pipe for the first release.
Public data resources utilized for the construction of MarRef, MarDB and MarCat
| Type | Database | URL |
|---|---|---|
| Sequence databases | ENA, European Nucleotide Archive |
|
| UniProt, Universal Protein Resource |
| |
| NCBI, National Center for Biotechnology Information |
| |
| Contextual databases | PATRIC, Pathosystems Resource Integration Center |
|
| GOLD, Genomes OnLine Database |
| |
| Taxonomic databases | SILVA, SILVA high quality ribosomal RNA database |
|
| NCBI Taxonomy browser |
| |
| Bacterial diversity metadatabases | BacDive, Bacterial Diversity Metadatabase |
|
| Culture collection databases | DSMZ, Deutsche Sammlung von Mikroorganismen und Zellkulturen GmbH |
|
| ATCC, American Type Culture Collection |
| |
| Marine organisms database | WoRMS, World Register of Marine Species |
|
| Web mapping service | Google maps |
|
| Literature databases | Europe PMC, Europe PubMed Central |
|
| PubMed |
| |
| doi, Digital Object Identifier System |
| |
| Ontology databases | BioPortal |
|
| Standards MIGS/MIMS | GSC, Genomic Standards Consortium |
|
Figure 2.Most occurring marine taxa. (A) The reference database MarRef at its current state has 618 records of cellular organisms in the Archaea and Bacteria domains. Its complete and closed genomes are most prominent within the Proteobacteria phylum and the Alteromonadales order. (B) The partially curated database MarDB has 3726 records of sequenced genomes. Of its 287 unique genera (8 are shown) Vibrio is the most prominent with 467 records. These node-depleted Sankey diagrams were simplified to only display nodes exceeding 10 and 59 records for MarRef and MarDB respectively. An exception was made for the metagenome-derived genomes of MarDB.
Figure 3.Accessing the MAR databases and their records. From within the front page of the MMP all three metadatabases and sequence databases can be reached by following the ‘Browse’ or ‘BLAST’ buttons respectively. Browsing a metadatabase leads to the map-overview before reaching its index table. Single entries can be studied by selecting them in the map or in the table.
Figure 4.The browsing interface and filtering functionality of MarRef and MarDB. (A) The default view as accessed from the corresponding database overview menu. The table content is instantaneously updated when filtering and responds to search words and 14 filtering fields. (B) Combining search words and filters enables search criteria to narrow the listed results in a highly flexible manner. (C) The metadata of each record is separated in eight expandable categories, (D) here illustrating parts of the summary. The index of MarCat (not shown) is less comprehensive, thus have fever filtering options.