| Literature DB >> 22479523 |
Nadine Ziemert1, Sheila Podell, Kevin Penn, Jonathan H Badger, Eric Allen, Paul R Jensen.
Abstract
New bioinformatic tools are needed to analyze the growing volume of DNA sequence data. This is especially true in the case of secondary metabolite biosynthesis, where the highly repetitive nature of the associated genes creates major challenges for accurate sequence assembly and analysis. Here we introduce the web tool Natural Product Domain Seeker (NaPDoS), which provides an automated method to assess the secondary metabolite biosynthetic gene diversity and novelty of strains or environments. NaPDoS analyses are based on the phylogenetic relationships of sequence tags derived from polyketide synthase (PKS) and non-ribosomal peptide synthetase (NRPS) genes, respectively. The sequence tags correspond to PKS-derived ketosynthase domains and NRPS-derived condensation domains and are compared to an internal database of experimentally characterized biosynthetic genes. NaPDoS provides a rapid mechanism to extract and classify ketosynthase and condensation domains from PCR products, genomes, and metagenomic datasets. Close database matches provide a mechanism to infer the generalized structures of secondary metabolites while new phylogenetic lineages provide targets for the discovery of new enzyme architectures or mechanisms of secondary metabolite assembly. Here we outline the main features of NaPDoS and test it on four draft genome sequences and two metagenomic datasets. The results provide a rapid method to assess secondary metabolite biosynthetic gene diversity and richness in organisms or environments and a mechanism to identify genes that may be associated with uncharacterized biochemistry.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22479523 PMCID: PMC3315503 DOI: 10.1371/journal.pone.0034064
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1NaPDoS bioinformatic pipeline.
The web interface to this pipeline is divided 3 consecutive steps. Nucleic acid sequences are translated into predicted amino acids and genomic sequences are screened using Hidden Markov Models (HMM). For protein and small nucleic acid sequences a BLAST search is performed against curated reference database examples to identify matches to known PKS/NRPS pathways. Selected candidate sequences plus the BLAST results are trimmed and inserted into a manually curated reference alignment, keeping the original reference alignment intact. This alignment is used to build a tree.
Figure 2Screen shot of the NaPDoS webpage.
Figure 3Phylogeny based domain classification.
A) KS domain phylogeny. Polyphyletic groups are distinguished by letters. B) C domain phylogeny.
KS domain classification.
| Type | Class | Description | Product (example) |
| I | Enediyne | Iteratively acting, builds typical 9 or 10 membered enedyines. | Enediyne (calicheamicin) |
|
| Module lacks cognate AT domain; this activity is provided instead by a discrete protein encoded in | Polyketide/macrolide(leinamycin) | |
|
| Multi-domain module that includes AT domain. | Polyketide/macrolide (erythromycin) | |
| Hybrid | Catalyzes a condensation reaction between an amino acid and an acyl extender unit in a NRPS/PKS pathway. | Peptide-polyketide(microcystin) | |
| Iterative | Domain is used multiple times in a cyclic fashion. | Polycyclic polyketide(aflatoxin) | |
| PUFA | Produces long chain fatty acids containing more than one double bond. | Polyunsaturated fatty acid(omega-3-fatty-acid) | |
| KS1 | Occurs in the first module of multimodular genes, includes typical starter KSs (KSQ) as well as KS domains that incorporate unusual precursors. | Polyketide, peptide-polyketide(salinosporamide) | |
| II | Type II | Each domain occurs on a discrete protein. | Aromatic polyketide (actinorhodin) |
| FAS | Involved in fatty acid biosynthesis (eg., FabB and FabF from bacteria). | Fatty acid(palmitic acid) |