| Literature DB >> 18978015 |
Antonio Starcevic1, Jurica Zucko, Jurica Simunkovic, Paul F Long, John Cullum, Daslav Hranueli.
Abstract
The program package 'ClustScan' (Cluster Scanner) is designed for rapid, semi-automatic, annotation of DNA sequences encoding modular biosynthetic enzymes including polyketide synthases (PKS), non-ribosomal peptide synthetases (NRPS) and hybrid (PKS/NRPS) enzymes. The program displays the predicted chemical structures of products as well as allowing export of the structures in a standard format for analyses with other programs. Recent advances in understanding of enzyme function are incorporated to make knowledge-based predictions about the stereochemistry of products. The program structure allows easy incorporation of additional knowledge about domain specificities and function. The results of analyses are presented to the user in a graphical interface, which also allows easy editing of the predictions to incorporate user experience. The versatility of this program package has been demonstrated by annotating biochemical pathways in microbial, invertebrate animal and metagenomic datasets. The speed and convenience of the package allows the annotation of all PKS and NRPS clusters in a complete Actinobacteria genome in 2-3 man hours. The open architecture of ClustScan allows easy integration with other programs, facilitating further analyses of results, which is useful for a broad range of researchers in the chemical and biological sciences.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18978015 PMCID: PMC2588505 DOI: 10.1093/nar/gkn685
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.(A) The workspace window gives an overview of the analysis in the form of collapsible trees. Detected genes and protein domains are shown. (B) The annotation editor window shows the location of genes (in red) and protein domains (in blue). In this case there are three genes on the three different forward open reading frames. The genes have been displaced from the reading frames by the user to allow better visualization of the domains. The annotation editor has been used for user definition of modules (shown as red curves below the open reading frames). (C) The cluster editor window. The user can define a set of contiguous genes as a cluster. The cluster editor window shows the genes in a cartoon form with an expanded view of the selected gene showing protein domains. Domains can be linked together to give modules. The modules are given identifying names and the program suggests a biosynthetic order that can be accepted or altered by the user.
Figure 2.The details window allows the user to examine the evidence for assignment of protein domains. The HMMER scores and E-values as well as the alignment are displayed. The predictions of activity and specificity are also displayed and can be modified by the user. (A) The loading AT domain of the erythromycin cluster. The program makes the correct prediction of a propionyl starter unit. By clicking on this choice, a selection window has been opened that allows the user to override the automatic prediction and select an alternative choice. (B) The KR domain of module 3 of the erythromycin cluster.
Figure 3.The molecules window. (A) The SMILES description for the linear backbone of erythromycin predicted from the DNA sequence of the cluster. The SMILES description can be copied to the clipboard for export. (B) The 3D structure of the predicted linear chain is shown. The mouse can be used to rotate the molecule. (C) The ring structure of the erythromycin aglycone as predicted using the cyclization function of the program.
Figure 4.Annotation editor window showing the analysis of a potential PKS–NRPS hybrid cluster from a marine metagenomic sequence. The following coloring is used: genes (red), PKS protein domains (green) and NRPS protein domains (blue). Although seven genes are shown, the distribution of domains between genes suggest that sequencing errors have occurred. The three boxes indicate the positions of the probable genes. The first gene has one frameshift, the second gene has two frameshifts and the third gene has an anomalous stop codon (ringed in black) in it. The positions where two AT domains would be expected are also ringed (in yelow).