| Literature DB >> 21106126 |
Daniel Ming Ming Tay1, Kunde Ramamoorthy Govindarajan, Asif M Khan, Terenze Yao Rui Ong, Hanif M Samad, Wei Wei Soh, Minyan Tong, Fan Zhang, Tin Wee Tan.
Abstract
BACKGROUND: Effectors of Type III Secretion System (T3SS) play a pivotal role in establishing and maintaining pathogenicity in the host and therefore the identification of these effectors is important in understanding virulence. However, the effectors display high level of sequence diversity, therefore making the identification a difficult process. There is a need to collate and annotate existing effector sequences in public databases to enable systematic analyses of these sequences for development of models for screening and selection of putative novel effectors from bacterial genomes that can be validated by a smaller number of key experiments.Entities:
Mesh:
Substances:
Year: 2010 PMID: 21106126 PMCID: PMC2957687 DOI: 10.1186/1471-2105-11-S7-S4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1T3SEdb search function and sample output page. A) The database can be queried via the NCBI accession number, domain or general keyword search which can also be restricted to the experimental status of the sequences (experimentally validated or hypothetical) or to a specific field in the sequence record. B) Search results display database record with T3SEdb accession number, effector name, hyperlinked NCBI Entrez Protein database accession number, source organism of the effector, sequence length, experimental status, last sequence update, name and accession of the primary/source database that the effector was retrieved from, sequence data, literature references (hyperlinked PubMed IDs) and T3SEdb curation comments (if any).
Figure 2High sequence diversity of T3SS effectors. At 100% amino acid sequence identity threshold between the 504 experimentally validated effector sequences, as many as 324 clusters were observed. When the % identity was reduced to 90%, tolerating 10% difference between the sequences, the number of clusters dropped significantly to 206. Allowing more differences between the sequences by reducing the identity threshold (even to as low as 10% identity) did not reduce the number of clusters significantly (171 clusters even at 10% identity). This highlights the high level of amino acid difference between T3SS effectors.
Performance measure of binary classifiers in WEKA for prediction of T3SEs.
| Training | Testing | ||||||
|---|---|---|---|---|---|---|---|
| Binary classifier | Aroc | SE | SP | Aroc | SE | SP | PPV |
| Bayesian Logistic Regression | 0.60 | 0.72 | 0.49 | 0.66 | 0.73 | 0.60 | 0.07 |
| Support vector machines (SVM) | 0.74 | 0.97 | 0.52 | 0.80 | 0.95 | 0.64 | 0.08 |
| BayesNet | 0.86 | 0.80 | 0.76 | 0.91 | 0.94 | 0.83 | 0.15 |
| Naïve Bayes | 0.89 | 0.84 | 0.82 | 0.93 | 0.91 | 0.83 | 0.17 |
SE, SP and PPV refer to sensitivity, specificity and positive predictive value measures, respectively. Aroc is the area under the receiver operator characteristic curve and is commonly used as a measure to assess the quality of a prediction model. The testing Aroc, SE and SP were done with a balanced dataset of 68 effectors and non-effectors that were not part of the training. The PPV was computed using an unbalanced dataset representing the approximate proportion of effectors and non-effectors in a bacterial genome.