| Literature DB >> 25157689 |
C Pawan K Patro1, Asif M Khan2, Tin Wee Tan1, Xin-Yuan Fu3.
Abstract
Signal transducers and activators of transcription (STAT) proteins are key signalling molecules in metazoans, implicated in various cellular processes. Increased research in the field has resulted in the accumulation of STAT sequence and structure data, which are scattered across various public databases, missing extensive functional annotations, and prone to effort redundancy because of the dearth of community sharing. Therefore, there is a need to integrate the existing sequence, structure and functional data into a central repository, one that is enriched with annotations and provides a platform for community contributions. Herein, we present STATdb (publicly available at http://statdb.bic.nus.edu.sg/), the first integrated resource for STAT sequences comprising 1540 records representing the known STATome, enriched with existing structural and functional information from various databases and literature and including manual annotations. STATdb provides advanced features for data visualization, analysis and prediction, and community contributions. A key feature is a meta-predictor to characterise STAT sequences based on a novel classification that integrates STAT domain architecture, lineage and function. A curation policy workflow has been devised for regulated and structured community contributions, with an update policy for the seamless integration of new data and annotations.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25157689 PMCID: PMC4144846 DOI: 10.1371/journal.pone.0104597
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
List of all fields defined for STATdb records.
| Field Name | Description | Source |
| STATdb Id | STATdb Unique Identifier/Accession Number |
|
| gName | Gene Name | Source & |
| pName | Protein Name | Source |
| STAT type | STAT family sub-group based on function | Source & |
| STATdb Classification | Classification based on three-tier system |
|
| Domain Architecture - Lineage - Function | ||
| DBXRef | Database Cross References | Source & |
| Literature | Literature (PubMed Reference Id) | Source |
| Species (Source Organism) | Species containing STAT | Source |
| Expt. Status | Experimental Status |
|
| E - Experimentally Verified | ||
| P - Predicted/Hypothetical | ||
| U - Unknown | ||
| Expt. Status Evidence | Experimental Status Evidence |
|
| ChromLoc | Chromosome Location | Source |
| IntPartners | Interacting Proteins |
|
| SeqLen | Sequence Length (Protein) | Source |
| Completeness | Completeness of the protein sequence |
|
| Complete/Incomplete | ||
| STAT Dom | STAT domains | Source |
| DomArchitecture | Domain Architecture |
|
| STAT DomSeq | Nucleotide & Protein Sequence of STAT domains |
|
| STAT_int | Nucleotide & Protein Sequence for protein interaction domain |
|
| STAT_alpha | Nucleotide & Protein Sequence for all alpha domain |
|
| STAT_bind | Nucleotide & Protein Sequence for DNA binding domain |
|
| STAT_sh2 | Nucleotide & Protein Sequence for SH2 domain |
|
| STAT_taz2 | Nucleotide & Protein Sequence for TAZ2 domain |
|
| BindingMotif | DNA Binding Motif |
|
| NucSeq | Nucleotide Sequence |
|
| ProtSeq | Protein Sequence | Source |
| Comment | STATdb Curation Comments |
|
Fields that provide information selected from the source record (NCBI Entrez Protein database) are marked as “Source”. “Assigned” fields are those not found in the source record, but were included to provide information obtained from analysis of the sequence data, existing annotations or the literature.
NCBI Entrez protein database.
The respective literature are indicated in the relevant records.
Figure 1A sample STATdb record.
Figure 2Snapshots of selected STATdb key features.
A) STATome Browser – allows for the dynamic browsing of the STATome, a complete set of reported STAT records in STATdb. B) Contribute – provides a platform for the STATdb community to curate annotations or submit new STAT sequences. C) Classification - provides a notation that describes the grouping of a sequence based on our three-tier classification system: “Domain Architecture – Lineage – Function” and D) Predict - characterizes protein sequences using STATdb classification.
Figure 3STATdbPredict output report page for STAT_00001.
The alignment is cropped to save space.
Performance measures of STATdbPredict (RPS-BLAST and BLASTp) versus standalone BLASTp search.
| A. STATdbPredict search | |||||||
| TP | FN | TN | FP | Accuracy (%) | Sensitivity (%) | Specificity (%) | |
|
| 85 | 11 | 96 | 0 | 94.27 | 88.54 | 100 |
|
| 95 | 1 | 96 | 0 | 99.48 | 98.96 | 100 |
|
| 90 | 6 | 96 | 0 | 96.88 | 93.75 | 100 |