| Literature DB >> 26656005 |
Thomas Wolf1, Vladimir Shelest1, Neetika Nath1, Ekaterina Shelest1.
Abstract
MOTIVATION: Secondary metabolites (SM) are structurally diverse natural products of high pharmaceutical importance. Genes involved in their biosynthesis are often organized in clusters, i.e., are co-localized and co-expressed. In silico cluster prediction in eukaryotic genomes remains problematic mainly due to the high variability of the clusters' content and lack of other distinguishing sequence features.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26656005 PMCID: PMC4824125 DOI: 10.1093/bioinformatics/btv713
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1Choosing the promoter range. The great majority of the genuine fungal TFBS from TRANSFAC and FunTF map to the region −1000/+50 bp
Fig. 2CASSIS algorithm. (A) Interim promoter sets around the anchor gene are submitted to MEME for motif prediction. (B) All found motifs are selected. (C) The motifs are submitted to FIMO for the genome-wide prediction in promoter (Pr) sequences. (D) The sequence of promoters, each characterized by the number of found motifs, is considered as the string of numbers. This number string is searched for an ‘island’ of mostly non-zero values, which is regarded as the cluster
Benchmark results of the LOO cross-validation for CASSIS
| Characteristics | CASSIS performance |
|---|---|
| Sensitivity | 0.84 ± 0.0010 |
| Specificity | 0.98 ± 0.0002 |
| Precision | 0.71 ± 0.0010 |
| Accuracy | 0.96 ± 0.0002 |
| FDR | 0.29 ± 0.0010 |
| 0.73 ± 0.0008 |
aAverage for all 38 LOO experiments. Error is the standard error of the mean. See Supplementary Table S1 for the list of used clusters
Comparison of CASSIS with the similarity-based antiSMASH and SMURF tools: re-identification of the 12 test clusters not used for the tools’ training
| Characteristics | Comparison | ||
|---|---|---|---|
| CASSIS | antiSMASH | SMURF | |
| Sensitivity | 0.87 ± 0.04 | 0.94 ± 0.04 | 0.78 ± 0.10 |
| Specificity | 0.96 ± 0.01 | 0.87 ± 0.02 | 0.84 ± 0.02 |
| Precision | 0.80 ± 0.05 | 0.54 ± 0.05 | 0.42 ± 0.06 |
| Accuracy | 0.94 ± 0.01 | 0.88 ± 0.01 | 0.82 ± 0.02 |
| FDR | 0.20 ± 0.05 | 0.46 ± 0.05 | 0.58 ± 0.06 |
| 0.81 ± 0.02 | 0.66 ± 0.04 | 0.51 ± 0.06 | |
aAverage for all 12 clusters. Error is the standard error of the mean. See Supplementary Table S1 for the list of used clusters