| Literature DB >> 34580179 |
João Pedro Saraiva1, Alexandre Bartholomäus2, René Kallies1, Marta Gomes3, Marcos Bicalho1, Jonas Coelho Kasmanas1,4,5, Carsten Vogt1, Antonis Chatzinotas1,6,7, Peter Stadler5,8,9,10,11, Oscar Dias3, Ulisses Nunes da Rocha12.
Abstract
The high complexity found in microbial communities makes the identification of microbial interactions challenging. To address this challenge, we present OrtSuite, a flexible workflow to predict putative microbial interactions based on genomic content of microbial communities and targeted to specific ecosystem processes. The pipeline is composed of three user-friendly bash commands. OrtSuite combines ortholog clustering with genome annotation strategies limited to user-defined sets of functions allowing for hypothesis-driven data analysis such as assessing microbial interactions in specific ecosystems. OrtSuite matched, on average, 96% of experimentally verified KEGG orthologs involved in benzoate degradation in a known group of benzoate degraders. We evaluated the identification of putative synergistic species interactions using the sequenced genomes of an independent study that had previously proposed potential species interactions in benzoate degradation. OrtSuite is an easy-to-use workflow that allows for rapid functional annotation based on a user-curated database and can easily be extended to ecosystem processes where connections between genes and reactions are known. OrtSuite is an open-source software available at https://github.com/mdsufz/OrtSuite.Entities:
Mesh:
Substances:
Year: 2021 PMID: 34580179 PMCID: PMC8500227 DOI: 10.26508/lsa.202101167
Source DB: PubMed Journal: Life Sci Alliance ISSN: 2575-1077
Figure 1.OrtSuite workflow.
OrtSuite takes a text file containing a list of identifiers for each reaction in the pathway of interest supplied by the user to retrieve all protein sequences from KEGG Orthology and are stored in ORAdb. Subsequently, the same list of identifiers is used to obtain the gene-protein-reaction (GPR) rules from KEGG modules (Task 1). Protein sequences from samples supplied by the user are clustered using OrthoFinder (Task 2). In Task 3, the functional annotation, identification of putative synergistic species interactions and graphical visualization of the network are performed. The functional annotation consists of a two-stage process (relaxed and restrictive search). Relaxed search performs sequence alignments between 50% of randomly selected sequences from each generated cluster. Clusters whose representative sequences share a minimum E-value of 0.001 to sequences in the reaction set(s) in ORAdb continue to the restrictive search. Here, all sequences from the cluster are aligned to all sequences in the corresponding reaction set(s) to which they had a hit (default E-value = 1 × 10−9). Next, the annotated sequences are further filtered to those with a bit score greater than 50 and are used to identify putative microbial interactions based on their functional potential. Constraints can also be added to reduce the search space of microbial interactions (e.g., subsets of reactions required to be performed by single species, transport-related reactions). In addition, an interactive network visualization of the results is produced and accessed via a HTML file.
Species names, strain and abbreviation codes of the genomes used to validate OrtSuite (Supplementary data - Test_genome_set).
| Name and strain | Abbreviation code | KEGG id | BTA pathway | Accession number | Reference |
|---|---|---|---|---|---|
| adv | T05474 | P3 |
|
| |
|
| ath | T00041 | — |
| * |
| aza | T02502 | P2 |
| ||
| azd | T05691 | P2 |
|
| |
| azi | T04019 | P2 |
|
| |
| bced | T03302 | P3 |
|
| |
| bvi | T00493 | P3 |
|
| |
| cyq | T02265 | P3 |
|
| |
| cza | T02780 | P3 |
|
| |
| dor | T01675 | — |
|
| |
| eba | T00222 | P2 |
|
| |
| lcm | T02913 | — |
| * | |
| magx | T04231 | P2 |
|
| |
| parb | T05169 | P3 |
|
| |
| rrz | T05142 | P3 |
|
| |
| shd | T03591 | P2 |
|
| |
| sscu | T05176 | — |
|
| |
| tmz | T00804 | P2, P3 |
|
|
The genomic potential, based on the KEGG database, to completely encode all proteins involved in a BTA pathway is identified in the column “BTA pathway” (P1: anaerobic conversion of benzoate to acetyl-CoA 1; P2: anaerobic conversion of benzoate to acetyl-CoA 2; P3: aerobic conversion of benzoate to acetyl-CoA). * indicates no literature was found connecting benzoate degradation and the respective species.
OrtSuite workflow runtime and clustering performance.
| OrtSuite step | Runtime |
|---|---|
| ORAdb construction and Generation of GPR_rules | 2 h 47 min |
| Generation of protein ortholog clusters | 54 min |
| Functional annotation of sequences in ortholog clusters | 6 min |
| Defining putative microbial interactions | 3 min |
| Total | 3 h 50 min |
| Precision (BLAST) | 0.63 |
| Recall (BLAST) | 0.77 |
| Precision (DIAMOND) | 0.64 |
| Recall (DIAMOND) | 0.85 |
The total runtime of each OrtSuite step when analyzing the genomic potential of species in Test_genome_set dataset in three pathways (P1, P2, and P3) for the conversion of benzoate to acetyl-CoA (BTA). Steps were performed with default parameters on a laptop with four cores and 16 GB of RAM. Pair-wise precision and recall results of OrthoFinder using BLAST and DIAMOND as an alignment search tool. Clustering was performed on the Test_genome_set dataset plus the mutated genomes.
Figure 2.Mapping of the Fetzer genome set to benzoate pathways.
Mapping of the genomic potential of each species from the Fetzer_genome_set dataset to each reaction in aerobic (yellow) and anaerobic (blue) benzoate-to-acetyl-CoA conversion pathways. Circles highlighted in green represent species that showed biomass growth in medium containing benzoate in the Fetzer study.
Figure 3.Example of the interactive network visualization included on OrtSuite results.
(A) The complete network with species is colored by reaction. (B) Species can be highlighted for simple identification. (C) Tooltips on reaction link out the KEGG if the reaction identifier is given.