| Literature DB >> 30853946 |
Peter D Karp1, Natalia Ivanova2, Markus Krummenacker1, Nikos Kyrpides2, Mario Latendresse1, Peter Midford1, Wai Kit Ong1, Suzanne Paley1, Rekha Seshadri2.
Abstract
Microbial genome web portals have a broad range of capabilities that address a number of information-finding and analysis needs for scientists. This article compares the capabilities of the major microbial genome web portals to aid researchers in determining which portal(s) are best suited to their needs. We assessed both the bioinformatics tools and the data content of BioCyc, KEGG, Ensembl Bacteria, KBase, IMG, and PATRIC. For each portal, our assessment compared and tallied the available capabilities. The strengths of BioCyc include its genomic and metabolic tools, multi-search capabilities, table-based analysis tools, regulatory network tools and data, omics data analysis tools, breadth of data content, and large amount of curated data. The strengths of KEGG include its genomic and metabolic tools. The strengths of Ensembl Bacteria include its genomic tools and large number of genomes. The strengths of KBase include its genomic tools and metabolic models. The strengths of IMG include its genomic tools, multi-search capabilities, large number of genomes, table-based analysis tools, and breadth of data content. The strengths of PATRIC include its large number of genomes, table-based analysis tools, metabolic models, and breadth of data content.Entities:
Keywords: genome databases; genome portals; microbial genome databases; microbial genomes; microbial genomics
Year: 2019 PMID: 30853946 PMCID: PMC6395428 DOI: 10.3389/fmicb.2019.00208
Source DB: PubMed Journal: Front Microbiol ISSN: 1664-302X Impact factor: 5.640
Genomics tools comparison.
| Genome browser | YES | YES | YES | YES | YES | YES |
| –Operons, promoters, TF binding sites | YES | no | no | no | Partial | YES |
| –Depicts nucleotide sequence | YES | YES | YES | YES | YES | YES |
| –Customizable tracks | YES | no | YES | no | Partial | YES |
| –Comparative, by orthologs | YES | no | no | no | YES | YES |
| –Genome poster | YES | no | no | no | no | no |
| Retrieve gene sequence | YES | YES | YES | YES | YES | YES |
| Retrieve replicon sequence | YES | YES | YES | no | YES | YES |
| Retrieve protein sequence | YES | YES | YES | YES | YES | YES |
| Nucleotide sequence alignment viewer | YES | YES | no | no | YES | YES |
| Protein sequence alignment viewer | YES | YES | no | no | YES | YES |
| Protein phylogenetic tree analysis | no | YES | no | YES | YES | YES |
| Sequence searching by BLAST | YES | YES | YES | YES | YES | YES |
| Sequence pattern search | YES | YES | no | YES | YES | no |
| Sequence cassette search | no | YES | YES | YES | YES | no |
| Orthologs | YES | YES | no | YES | YES | YES |
| Gene/Protein page | YES | YES | YES | YES | YES | YES |
| Enrichment analysis (GO terms) | YES | no | no | YES | no | no |
| Enrichment analysis (regulation) | YES | no | no | no | no | no |
| Omics dashboard | YES | no | no | no | no | no |
| Multi-organism comparative analysis | YES | YES | YES | YES | YES | YES |
| Horizontal gene transfer prediction | no | no | no | no | YES | no |
| Fused protein prediction | no | no | no | no | YES | no |
| Alternative ORF view | no | no | no | no | YES | YES |
| Genome multi-search | YES | no | no | no | YES | YES |
| gANI computations | no | no | no | YES | YES | YES |
| Kmer frequency analysis | no | no | no | no | YES | no |
| Synteny comparison | no | no | no | YES | YES | no |
| Proteome comparisons | YES | no | no | YES | YES | YES |
| Statistical analysis, genome | YES | no | no | no | YES | no |
| Statistical analysis, expression | no | no | no | YES | YES | YES |
| Genome function comparison | no | no | no | YES | YES | YES |
| Insert genomes into reference trees | no | no | no | YES | no | YES |
| Predict effects of sequence variants | no | no | YES | no | no | YES |
“Partial” means that the tool provides some but not all of the indicated functionality.
KEGG does have a rudimentary tool for this purpose, but it is not based on a zoomable genome browser.
PATRIC supports construction of trees from an arbitrary set of in-group and out-group genomes.
Gene/protein multi-search capabilities.
| Gene name | YES | YES | YES | YES | YES | YES |
| Product name | YES | YES | YES | YES | YES | YES |
| Database identifier | YES | YES | YES | YES | YES | YES |
| EC number | YES | YES | YES | no | YES | YES |
| Sequence length | YES | no | no | YES | YES | YES |
| Replicon | YES | no | no | YES | YES | YES |
| Map position | YES | YES | no | YES | YES | no |
| Product mol wt | YES | no | no | no | YES | no |
| Product subunits | YES | no | no | no | YES | no |
| Product pI | YES | no | no | no | YES | no |
| Product ligands | YES | no | no | no | YES | no |
| Evidence code | YES | no | no | no | no | no |
| Cell component | YES | no | no | no | no | no |
| GO terms | YES | no | YES | YES | YES | YES |
| Protein features | YES | no | YES | no | YES | no |
| Publication | YES | no | no | YES | no | no |
| scaffold length | no | YES | no | YES | YES | no |
| Scaffold GC content | no | no | no | no | YES | YES |
| Protein family assignment | no | YES | YES | no | YES | YES |
| Is partial | no | no | no | no | YES | no |
| Is pseudogene | YES | no | no | no | YES | YES |
Does the portal support multi-searches for genes and gene products based on the data fields or criteria listed? “Publication” means the ability to search for a gene based on a publication cited in the pathway entry. “Scaffold Length” means the ability to search for a gene based on the length of the scaffold it resides on. “Protein Family Assignment” means the ability to search for a gene based on what protein families it is assigned to (e.g., Pfam or TIGRFAM family). “Is Partial” means search for partial (truncated) proteins.
DNA/RNA Site Multi-Search Capabilities.
| Site type | YES | no | no | no | no | no |
| –Attenuators | YES | no | no | no | no | no |
| –Origin of replication | YES | no | no | no | no | no |
| –Phage attachment sites | YES | no | no | no | no | no |
| –REP elements | YES | no | no | no | no | no |
| –Promoters | YES | no | no | no | no | no |
| –Terminators | YES | no | no | no | no | no |
| –mRNA binding sites | YES | no | no | no | YES | no |
| –Riboswitches | YES | no | no | no | YES | no |
| –TF binding sites | YES | no | no | no | no | no |
| –Transcription units | YES | no | no | no | no | no |
| –Transposons | YES | no | no | no | no | no |
| Replicon | YES | no | no | no | YES | no |
| Map position | YES | no | no | no | YES | no |
| Site regulator | YES | no | no | no | no | no |
| Site ligands | YES | no | no | no | no | no |
| Evidence code | YES | no | no | no | no | no |
| CRISPR arrays | no | no | no | no | YES | no |
Does the portal support multi-searches for DNA and RNA sites based on the data fields or criteria listed? For example, does the portal support searches for sites by the type of site (e.g., for attenuators vs. transcription-factor binding sites), and by numeric constraints on the genome position of the site?
Metabolic tools comparison.
| Metabolite page | YES | YES | no | no | no | no |
| Chemical similarity search | no | YES | no | no | no | no |
| Glycan similarity search | no | YES | no | no | no | no |
| Reaction page | YES | YES | no | no | YES | no |
| –Reaction atom mappings | YES | YES | no | no | no | no |
| Individual pathway diagram | YES | YES | no | YES | YES | YES |
| –Automatic pathway layout | YES | no | no | no | no | no |
| –Paint omics data onto pathway | YES | YES | no | no | YES | no |
| –Depict enzyme regulation | YES | no | no | no | no | no |
| –Depict genetic regulation | YES | no | no | no | no | no |
| –Depict metabolite structures | YES | YES (Tooltip) | no | no | no | no |
| Multi-pathway diagram | YES | no | no | no | no | no |
| Full metabolic network diagram | YES | YES | no | no | no | no |
| –Zoomable metabolic network | YES | YES | no | no | no | no |
| –Paint omics data onto diagram | YES | no | no | no | no | no |
| –Animated omics data painting | YES | no | no | no | no | no |
| –Metabolic poster | YES | no | no | no | no | no |
| –Organism comparison | YES | no | no | no | no | no |
| Automated metabolic reconstruction | YES (Desktop) | YES | no | YES | YES | YES |
| Enrichment analysis (Pathways) | YES | no | no | no | YES | no |
| Execute metabolic model | YES | no | no | YES | no | YES |
| –Gene knock-out analysis | YES | no | no | YES | no | YES |
| Chokepoint analysis | YES | no | no | no | no | no |
| Dead-end metabolite analysis | YES | no | no | no | no | no |
| Blocked-reaction analysis | YES | no | no | YES | no | no |
| Route search tool | YES | YES | no | no | no | no |
| Path prediction tool | no | YES | no | no | no | no |
| Assign EC number | no | YES | no | no | no | no |
The desktop version of the Pathway Tools software performs automated metabolic reconstruction.
Compound multi-search capabilities.
| Name | YES | YES | no | no | YES | YES |
| Database identifier | YES | YES | no | no | YES | YES |
| Ontology | YES | no | no | no | YES | YES |
| Monoisotopic mass | YES | no | no | no | Partial | no |
| Molecular weight | YES | no | no | no | Partial | no |
| Chemical formula | YES | no | no | no | Partial | no |
| Chemical substructure | YES | YES | no | no | Partial | no |
| InChi string | YES | no | no | no | Partial | no |
| InChi key | YES | no | no | no | Partial | no |
Does the portal support multi-searches for chemical compounds based on the data fields or criteria listed? “Ontology” means the ability to search for compounds based on a chemical ontology (classification).
This search will find pages of antimicrobial compounds.
Pathway multi-search capabilities.
| Name | YES | YES | no | no | YES | YES |
| Ontology | YES | YES | no | no | YES | YES |
| Size in reactions | YES | no | no | no | no | no |
| Substrates | YES | YES | no | no | YES | no |
| Evidence code | YES | no | no | no | no | no |
| Publication | YES | no | no | no | no | no |
Does the portal support multi-searches for pathways based on the data fields or criteria listed? “Ontology” means the ability to search for pathways based on a pathway ontology (classification).
Comparison of advanced search and analysis, web Services, and user accounts.
| Advanced search | YES | no | no | no | YES | no |
| Cross-organism search | YES | YES | YES | Partial | YES | YES |
| web services | YES | YES | YES | YES | no | no |
| Other query options | * | * | * | * | * | * |
| User account | Opt/req | no | Optional | Required | Opt/req | Opt/req |
| Custom notifications | YES | no | no | no | no | no |
| Download formats | Biopax,gff | Json,sbml | Fasta,gff,gff3 | Genbank,gff,tsv | Fasta,txt | Csv,fasta,gff |
| genbank | json,mysql,rdf | fasta,json,sbml | embl,json | |||
| sbml | genbank |
“Opt/Req” means that user accounts are optional for some operations and required for other operations. IMG also provides for downloading of reads, assemblies, QC reports, annotations, and more.
Table-based analysis capabilities.
| Genomes | no | no | no | no | no | YES |
| Genes | YES | no | no | no | YES | YES |
| Proteins | YES | no | no | no | YES | YES |
| RNAs | YES | no | no | no | YES | YES |
| Metabolites | YES | no | no | no | Partial | no |
| Pathways | YES | no | no | no | Partial | YES |
| Reactions | YES | no | no | no | Partial | no |
| Promoters | YES | no | no | no | no | no |
| Terminators | YES | no | no | no | no | no |
| Transcription factor binding sites | YES | no | no | no | no | no |
| Transcription units | YES | no | no | no | Partial | no |
| Publications | YES | no | no | no | no | no |
| Transciptomics experiments | no | no | no | no | partial | YES |
| Biosynthetic clusters | no | no | no | no | YES | no |
| Protein families | no | no | no | no | no | YES |
| Create table from uploaded file | YES | no | no | no | YES | YES |
| Create table from database query result | YES | no | no | no | YES | YES |
| Include database properties as table columns | YES | no | no | no | YES | YES |
| Create columns as computational transformations | YES | no | no | no | no | no |
| Set operations among tables | YES | no | no | no | YES | YES |
| Filter table rows | YES | no | no | no | YES | YES |
| Export table to file | YES | no | no | no | YES | YES |
| Share table with selected users | YES | no | no | no | YES | YES |
| Share table to the public | YES | no | no | no | no | YES |
PATRIC provides tables of genomes and tables of features (defined sections of a genome, e.g., genes, CDS, mRNAs).
Data types comparison.
| Genomes | 14,560 | 5,130 | 44,046 | 122,688 | 97,179 | 184,000 |
| Bacterial genomes | 14,134 | 4,854 | 43,552 | 121,994 | 66,362 | 181,260 |
| Archaeal genomes | 394 | 276 | 494 | 694 | 1,724 | 2,881 |
| Uncultivated organisms | 0 | 11,466 | 0 | |||
| Genome metadata | YES | YES | no | no | YES | YES |
| Regulatory networks | 11 | no | no | no | no | no |
| Protein localization | YES | no | no | no | no | no |
| Protein features | YES | no | YES | no | Partial | YES |
| Protein 3-D structures | no | YES | no | no | no | no |
| GO terms | YES | no | YES | YES | YES | YES |
| Evidence codes | YES | no | no | no | YES | Partial |
| Operons | YES | no | no | no | no | YES |
| Prophages | YES | no | no | no | YES | YES |
| Growth media | YES | no | no | YES | no | no |
| Gene essentiality | YES | no | no | no | no | YES |
| Gene clusters for secondary metabolites | no | no | no | no | YES | no |
| Gene pairs with correlated expression | no | no | no | no | no | YES |
| Protein-protein interactions | no | no | no | no | no | YES |
| AMR phenotypes | no | no | no | no | no | YES |
PATRIC includes evidence codes in only two DB tables.
User experience features.
| Gene page load time (s) | 4.4 | 2.5 | 10.0 | 9.8 | 13.5 | 34.9 |
| Tooltips | YES | no | YES | YES | YES | YES |
| User guide | YES | YES | YESb | YES | YES | YES |
| Webinars | YES | no | YESb | YES | YES | YES |
| Workshops | YES | ? | YES | YES | YES | YES |
The extent of gene details and visualization displayed is vastly different among sites and can lead to longer page load times. .
Tallies of portal capabilities from previous tables.
| Genome | 22 | 14 | 11 | 18 | 27 | 23 |
| Metabolic | 24 | 14 | 0 | 7 | 5 | 4 |
| Regulatory | 7 | 0 | 0 | 0 | 0 | 0 |
| Advanced | 5 | 2 | 3 | 2.5 | 3 | 2 |
| Tables | 20 | 0 | 0 | 0 | 13.5 | 15 |
| Multi-search | 49 | 12 | 7 | 10 | 32 | 15 |
| Data Types | 10 | 2 | 2 | 2 | 5.5 | 9.5 |
| Totals (excl Multi) | 88 | 32 | 16 | 29.5 | 54 | 53.5 |
Row “Genome” summarizes the major capabilities for genomics tools present in .
Figure 1Spider plot of the data in Table 11, excluding the Multi-Search row to enhance resolution.