| Literature DB >> 29961821 |
Cheng Zeng1, Weihua Zhan2, Lei Deng1,3.
Abstract
Annotating functional terms with individual domains is essential for understanding the functions of full-length proteins. We describe SDADB, a functional annotation database for structural domains. SDADB provides associations between gene ontology (GO) terms and SCOP domains calculated with an integrated framework. GO annotations are assigned probabilities of being correct, which are estimated with a Bayesian network by taking advantage of structural neighborhood mappings, SCOP-InterPro domain mapping information, position-specific scoring matrices (PSSMs) and sequence homolog features, with the most substantial contribution coming from high-coverage structure-based domain-protein mappings. The domain-protein mappings are computed using large-scale structure alignment. SDADB contains ontological terms with probabilistic scores for more than 214 000 distinct SCOP domains. It also provides additional features include 3D structure alignment visualization, GO hierarchical tree view, search, browse and download options.Database URL: http://sda.denglab.org.Entities:
Mesh:
Substances:
Year: 2018 PMID: 29961821 PMCID: PMC6025185 DOI: 10.1093/database/bay064
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1.Flowchart of SDADB construction. For each domain in the SCOPe database, GO annotations are predicted with the four component methods: (i) GO annotations predicted using P2D mappings: protein-SCOP domain mappings are calculated by large-scale structure alignment, then the probability that a domain annotated by a specific function is computed; (ii) GO annotations predicted using Scop-InterPro mappings: we use InterProScan to search InterPro domains for the SCOP domain, and transfer the annotations of these InterPro domains in the InterPro2GO database to the target SCOP domain; (iii) GO annotations predicted using PSSM profiles: SVM models for GO function annotation are trained with fixed length of PSSM vectors, which are calculated using ACC transformation; (iv) GO annotations predicted using sequence homologs: we transfer the GO annotations of the sequence homologs in UniProt-GOA to the target SCOP domain. Finally, the SDADB database is built by integrating the outputs of the four component methods with a Bayesian network.
Figure 2.A snapshot of the SDADB web interface. (A) The GO annotations of a query domain are listed. (B) The GO tree view shows the hierarchical architecture of GO for the query domain.
Figure 3.The structure alignment view for the domain-protein mappings.
Prediction performance comparison of SDADB with the four component methods
| Methods | Coverage | |||
|---|---|---|---|---|
| Str | 0.802 | 0.806 | 0.690 | 0.767 |
| Blast | 0.918 | 0.755 | 0.616 | 0.687 |
| PSSM | 0.509 | 0.491 | 0.364 | 0.507 |
| IPR | 0.555 | 0.711 | 0.508 | 0.263 |
| SDADB |
Figure 4.Precision–recall curve of SDADB versus existing methods for molecular function (A) and biological process (B).