| Literature DB >> 26568329 |
Yuan Zhou1, Shiping Yang1, Tonglin Mao2, Ziding Zhang3.
Abstract
The wide functional impacts of microtubules are unleashed and controlled by a battery of microtubule-associated proteins (MAPs). Specialists in the field appreciate the diversity of known MAPs and propel the identifications of novel MAPs. By contrast, there is neither specific database to record known MAPs, nor MAP predictor that can facilitate the discovery of potential MAPs. We here report the establishment of a MAP-centered online analysis tool MAPanalyzer, which consists of a MAP database and a MAP predictor. In the database, a core MAP dataset, which is fully manually curated from the literature, is further enriched by MAP information collected via automated pipeline. The core dataset, on the other hand, enables the building of a novel MAP predictor which combines specialized machine learning classifiers and the BLAST homology searching tool. Benchmarks on the curated testing dataset and the Arabidopsis thaliana whole genome dataset have shown that the proposed predictor outperforms not only its own components (i.e. the machine learning classifiers and BLAST), but also another popular homology searching tool, PSI-BLAST. Therefore, MAPanalyzer will serve as a promising computational resource for the investigations of MAPs. Database URL: http://systbio.cau.edu.cn/mappred/.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26568329 PMCID: PMC4644220 DOI: 10.1093/database/bav108
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Statistics of the manually curated core dataset. (A) The fraction of different classes of microtubule related proteins; (B) Statistics of source organisms, including human (Homo sapiens), mouse (Mus musculus), fruit fly (Drosophila melanogaster), Arabidopsis (Arabidopsis thaliana), rat (Rattus norvegicus), budding yeast (Saccharomyces cerevisiae), toad (Xenopus laevis), fission yeast (Schizosaccharomyces pombe) and others; (C) Publication year distribution of the supporting references; (D) Overlap with the UniProtKB and Gene Ontology (GO) databases (version of December, 2014), where green bars (‘all’) present the statistics about all of the microtubule-related proteins, while the counts indicted by blue bars (‘direct’) only take proteins that directly bind microtubules into consideration.
Performance comparison on the curated testing dataset at various stringency thresholds
| Method | Stringency | Threshold | Sensitivity (%) | Specificity (%) |
|---|---|---|---|---|
| lapSVM (motif) | Very high | 0.42 | 8.3 | 99.0 |
| High | 0.1 | 33.3 | 95.0 | |
| Moderate | −0.187 | 45.8 | 90.0 | |
| lapSVM (CKSAAP) | Very high | 0.274 | 6.3 | 99.0 |
| High | 0.041 | 37.5 | 95.0 | |
| Moderate | −0.06 | 52.0 | 90.0 | |
| BLAST | Very high | 30 | 22.9 | 99.0 |
| High | 4.22 | 27.1 | 95.0 | |
| Moderate | 1.54 | 31.2 | 90.0 | |
| PSIBLAST | Very high | 84.4 | 18.8 | 99.0 |
| High | 23 | 29.1 | 95.0 | |
| Moderate | 8.05 | 39.6 | 90.0 | |
| Combined | Very high | 0.121 | 25.0 | 99.0 |
| High | 0.019 | 41.7 | 95.0 | |
| Moderate | −0.008 | 56.3 | 90.0 | |
| Low | −0.042 | 75.0 | 80.0 |
The combined predictor integrates two lapSVM classifiers (based on the representative motifs and the CKSAAP encoding, respectively) with BLAST. For fair comparisons, we have applied three stringency thresholds corresponding to the 99, 95 and 90% specificities of each predictor, respectively. A low stringency threshold is also applied for the combined predictor to enable more sensitive predictions.
Performance comparison on the Arabidopsis whole genome dataset at the predefined thresholds
| Method | Running time (h) | Stringency | Threshold | Sensitivity (%) | Specificity | MCC |
|---|---|---|---|---|---|---|
| Combined | 3 | Very high | 0.121 | 9.0 | 98.8 | 0.102 |
| High | 0.019 | 17.7 | 94.9 | 0.090 | ||
| Moderate | −0.008 | 28.0 | 92.0 | 0.114 | ||
| Low | −0.042 | 48.2 | 83.9 | 0.136 | ||
| BLAST | 1.5 | Very high | 30 | 8.9 | 98.5 | 0.091 |
| High | 4.22 | 16.6 | 92.0 | 0.050 | ||
| Moderate | 1.54 | 21.6 | 88.3 | 0.049 | ||
| PSIBLAST | 4328 | Very high | 84.4 | 9.9 | 98.8 | 0.114 |
| High | 23 | 16.6 | 92.6 | 0.055 | ||
| Moderate | 8.05 | 26.2 | 87.9 | 0.068 |
The combined predictor integrates two lapSVM classifiers (based on the representative motifs and the CKSAAP encoding, respectively) with BLAST. The thresholds at different stringency levels are as the same as those used in Table 1. The low stringency threshold is also applied for the combined predictor to enable more sensitive predictions. The running time is equivalent to the time consumption under the condition of Dell Power Edge R810 server using a single CPU (Intel Xeon CPU E7-4807, 1.87 GHz).
Figure 2.The prediction page of MAPanalyzer. Two prediction modes (i.e. the single prediction mode and the batch prediction mode) are available, and the input form for the former one is shown here. By applying the single prediction mode, a user can submit one protein sequence and the preferred threshold to run prediction. The previous prediction results can be retrieved by inputting the Job ID into the textbox located at the bottom of this prediction page.