| Literature DB >> 31032842 |
Malo Le Boulch1, Patrice Déhais2, Sylvie Combes1, Géraldine Pascal1.
Abstract
Progress in genome sequencing and bioinformatics opens up new possibilities, including that of correlating genome annotations with functional information such as metabolic pathways. Thanks to the development of functional annotation databases, scientists are able to link genome annotations with functional annotations. We present MetAboliC pAthways DAtabase for Microbial taxonomic groups (MACADAM) here, a user-friendly database that makes it possible to find presence/absence/completeness statistics for metabolic pathways at a given microbial taxonomic position. For each prokaryotic 'RefSeq complete genome', MACADAM builds a pathway genome database (PGDB) using Pathway Tools software based on MetaCyc data that includes metabolic pathways as well as associated metabolites, reactions and enzymes. To ensure the highest quality of the genome functional annotation data, MACADAM also contains MicroCyc, a manually curated collection of PGDBs; Functional Annotation of Prokaryotic Taxa (FAPROTAX), a manually curated functional annotation database; and the IJSEM phenotypic database. The MACADAM database contains 13 509 PGDBs (13 195 bacterial and 314 archaeal), 1260 unique metabolic pathways, completed with 82 functional annotations from FAPROTAX and 16 from the IJSEM phenotypic database. MACADAM contains a total of 7921 metabolites, 592 enzymatic reactions, 2134 EC numbers and 7440 enzymes. MACADAM can be queried at any rank of the NCBI taxonomy (from phyla to species). It provides the possibility to explore functional information completed with metabolites, enzymes, enzymatic reactions and EC numbers. MACADAM returns a tabulated file containing a list of pathways with two scores (pathway score and pathway frequency score) that are present in the queried taxa. The file also contains the names of the organisms in which the pathways are found and the metabolic hierarchy associated with the pathways. Finally, MACADAM can be downloaded as a single file and queried with SQLite or python command lines or explored through a web interface.Entities:
Mesh:
Year: 2019 PMID: 31032842 PMCID: PMC6487390 DOI: 10.1093/database/baz049
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Overview of MACADAM, BioCyc, PATRIC and KEGG database features with a focus on metabolic pathway and functional information among prokaryotic organisms
|
|
|
|
| |
|---|---|---|---|---|
|
| NCBI taxonomy | NCBI taxonomy | NCBI taxonomy | KEGG taxonomy (with cross-link to NCBI) |
|
| On one or several taxonomies or organisms, with few filters | On one organism, with multiple filters | On one or several taxonomies or organisms, with multiple filters | On one organism, with no filters |
|
| 13 195 | ~13 400 | 198 855 | 5014 |
|
| 314 | ~400 | 3069 | 285 |
|
| 1260 | 2666 | 143 | 530 |
|
| RefSeq (complete genomes) | Genbank and RefSeq | Genbank and RefSeq | Genbank and RefSeq |
|
| RefSeq (functional annotations), MetaCyc (metabolic pathways), MicroCyc (metabolic pathways), FAPROTAX (functional features), IJSME PhenoDB (phenotypic data) | Genbank/RefSeq (annotations) and MetaCyc (metabolic pathways) | Genbank/RefSeq, KEGG (metabolic pathways) | KEGG |
|
| PS and PFS | SmartTables, genome browser, omics data analysis, metabolic models and routes and comparative analysis | KEGG pathway map, comparative pathway heatmap, multiple sequence alignment, enzymes and genes conservation in pathway | KEGG mapper tools |
|
| Metabolic pathway name, pathway class hierarchy, hyperlink to MetaCyc pathway and functional information of the upper rank for taxa without data | Metabolic pathway name, pathway class hierarchy, pathway map including enzymes and metabolites, associated genes, protein associated with pathways and literature references | Metabolic pathway name, pathway class hierarchy, KEGG pathway map including enzymes and metabolites, associated genes, enzyme and gene evolution data | Metabolic pathway name, pathway class hierarchy, KEGG pathway map including enzymes and metabolites, associated genes and literature references |
|
| Yes | Yes for academics | No | Yes with license |
|
| Yes | Yes | Yes | Available on the web only |
|
| SQLite or python script | Application Programming Interface (API) | API + free command line software | API |
|
| 6 months | 2–6 months | 6 months | 3 months |
Figure 1MACADAM building workflow.
Statistics on PGDBs collected for MACADAM (values are mean ± SD; values in brackets are minimum and maximum values)
|
|
|
|
|
| |
|---|---|---|---|---|---|
| MetaCyc PGDBs | 9954 | 156 ± 60 [3–350] | 851 | 0.85 ± 0.21 [0–1] | 1.37 ± 1.21 [0–51] |
| MicroCyc PGDBs that have replaced a MetaCyc PGDB | 1560 | 247 ± 80 [26–425] | 1012 | 0.75 ± 0.28 [0.344–1] | 1.37 ± 1.45 [0.344–77] |
| PGDBs only present in MicroCyc | 1681 | 255 ± 70 [2–422] | 1007 | 0.75 ± 0.28 [0.344–1] | 1.32 ± 1.28 |
|
|
|
|
|
|
|
| MetaCyc PGDBs | 184 | 60 ± 14 [16–91] | 207 | 0.84 ± 0.22 [0.2–1] | 1.35 ± 1.11 [0.2–15] |
| MicroCyc PGDBs that have replaced a MetaCyc PGDB | 96 | 107 ± 25 [2–156] | 393 | 0.77 ± 0.28 [0.05–1] | 1.28 ± 1.12 [0.05–21] |
| PGDBs only present in MicroCyc | 34 | 98 ± 22 [8–149] | 344 | 0.77 ± 0.28 [0.05–1] | 1.21 ± 0.93 [0.05–16] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
in bold: summary of metrics for bacteria, archaea or both.
Figure 2At the top, formulas for the computation of the PS and PFS. Below, examples of two types of computations of the PS and PFS based on an example of a hypothetical metabolic pathway. By comparing the PS and PFS in A and B, pathway A shows greater evidence of its veracity.
Figure 3Comparison of PS of all metabolic pathways of PGDBs present in both MetaCyc PGDBs and MicroCyc PGDBs. A: among Bacteria; B: among Archaea.
Statistics on FAPROTAX and IJSEM phenotypic database information in the MACADAM database for bacterial and archaea organisms
|
|
| |
|---|---|---|
| FAPROTAX |
|
|
| IJSEM phenotypic database |
|
|
Figure 4MACADAM database schema. Yellow arrows indicate the entry points of the database.
Figure 5Screenshot of MACADAM Explore website showing the query of all functional information containing the word ‘urea’ in the species ‘Staphylococcus aureus’ and ‘Kitasatospora aureofaciens’.
Figure 6Examples of output files corresponding to requests on MACADAM. (A) The user has searched for all metabolic pathways in S. aureus and K. aureofaciens, using the term ‘urea’ in the function field text. (B) The user has searched for all metabolic pathways in Lactobacillus cerevisiae using the term ‘urea’ in the function field text. Since there is no data on this organism in MACADAM, the information was searched for higher up in the taxonomy hierarchy, i.e. Lactobacillus. (1)List of organisms in MACADAM with the targeted metabolic pathway.
Figure 7Phyla distribution in the MACADAM database according to their database of origin. (A) MetaCyc and MicroCyc for bacterial organisms, (B) MetaCyc and MicroCyc for archaea organisms, (C) FAPROTAX for bacterial organisms, (D) FAPROTAX for archaea organisms, (E) IJSEM phenotypic database for bacterial organisms and (F) IJSEM phenotypic database for archaea organisms.
Figure 8MACADAM functional diversity for phyla with >10 PGDBs in MACADA (A) among bacterial phyla, (B) among archaea phyla, (C) the 10 main hierarchical groups of pathways in all bacterial organisms and (D) the 10 main hierarchical groups of pathways in all archaea organisms.
Statistics on the L-lysine fermentation to acetate and butanoate pathway (MetaCyc ID is P163-PWY; values in brackets are minimum and maximum values
|
| |
|---|---|
| Number of reactions in the complete pathway* | 10 |
| Number of bacteria in which this pathway is present | 756 |
| Median number of unique enzymes present in this pathway in organisms | 4 [1–10] |
| Median number of enzymes present in this pathway in organisms | 7 [1–70] |
| Key reaction* | 1 (enzyme classification: 5.4.3.2) |
| PS | 0.4 [0.1–1] |
| PFS | 0.7 [0.1–7] |
*Value found in a MetaCyc flat file; a key reaction is a reaction that is specific to a single pathway, i.e. a reaction that is not found in any other pathway.