| Literature DB >> 27903896 |
Michalis Hadjithomas1, I-Min A Chen2, Ken Chu2, Jinghua Huang2, Anna Ratner2, Krishna Palaniappan2, Evan Andersen2, Victor Markowitz2, Nikos C Kyrpides3, Natalia N Ivanova4.
Abstract
Secondary metabolites produced by microbes have diverse biological functions, which makes them a great potential source of biotechnologically relevant compounds with antimicrobial, anti-cancer and other activities. The proteins needed to synthesize these natural products are often encoded by clusters of co-located genes called biosynthetic gene clusters (BCs). In order to advance the exploration of microbial secondary metabolism, we developed the largest publically available database of experimentally verified and predicted BCs, the Integrated Microbial Genomes Atlas of Biosynthetic gene Clusters (IMG-ABC) (https://img.jgi.doe.gov/abc/). Here, we describe an update of IMG-ABC, which includes ClusterScout, a tool for targeted identification of custom biosynthetic gene clusters across 40 000 isolate microbial genomes, and a new search capability to query more than 700 000 BCs from isolate genomes for clusters with similar Pfam composition. Additional features enable fast exploration and analysis of BCs through two new interactive visualization features, a BC function heatmap and a BC similarity network graph. These new tools and features add to the value of IMG-ABC's vast body of BC data, facilitating their in-depth analysis and accelerating secondary metabolite discovery.Entities:
Mesh:
Year: 2016 PMID: 27903896 PMCID: PMC5210574 DOI: 10.1093/nar/gkw1103
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Similarity Search (A) BC similarity search is accessible through the ‘Similar Clusters’ tab (i) in the Biosynthetic Cluster Detail page, in this case the streptomycin BC from Streptomyces griseus. (B) The results are presented as a table which includes the ‘Jaccard Score’ and the ‘Adjusted Jaccard Score’. As expected, two more experimentally verified BCs known to produce streptomycin are retrieved, in addition to multiple predicted BCs. Users can further analyze these BCs by (i) adding them to the BC Cart, or (ii) visualize the neighborhoods of selected BCs to compare them with the query BCs.
Figure 2.ClusterScout. (A) User Interface with algorithm logic schematic. (B) The results from a ClusterScout search are presented in a table, and custom BCs can be added to the BC Cart for analysis.
Figure 3.IMG-ABC case study. (A) The BC Cart is the virtual space where BC analysis can be performed. (B) The function heatmap visualization can be used to study the Pfam content of BCs. Cells are colored with hues of green based on the number of copies of the selected Pfam in the BC (darker signifies a higher copy number). Hovering the mouse pointer over a cell provides the number of copies of that Pfam in the BC. Pfams (columns) that occur in all BCs (rows) likely define the core functions of the BCs in view. Column and row metadata are found on top and to the right, respectively, of the heatmap. Hovering the mouse pointer over these metadata cells provides more detail information. These metadata can be used for quick visual inspection and identification of patterns. For example, betaproteobacteria containing the DAPG BC are easily discoverable (asterisk). (C) The similarity network graph provides another way to summarize the data. The BCs in this example fall into three distinct groups; two groups contain gammaproteobacteria (red nodes) while one group consists of betaproteobacteria (purple nodes). The green node represents experimentally verified BC for DAPG (from a gammaproteobacterium Pseudomonas fluorescens) in the IMG-ABC database. Clicking on a node reveals the metadata associated with the BC (table on the right). The color of the nodes can be changed to display different metadata, such as taxonomic classification or evidence. (D) Visualization of the putative DAPG BC neighborhoods from the four betaproteobacterial BCs and one BC from each gammaproteobacterial group shows that although the flanking regions of the BCs differ, the core genes are conserved, thus it is likely that these newly discovered BCs indeed encode the necessary proteins for DAPG production.