| Literature DB >> 15608182 |
Kristian Vlahovicek1, László Kaján, Vilmos Agoston, Sándor Pongor.
Abstract
SBASE (http://www.icgeb.trieste.it/sbase) is an online resource designed to facilitate the detection of domain homologies based on sequence database search. The present release of the SBASE A library of protein domain sequences contains 972,397 protein sequence segments annotated by structure, function, ligand-binding or cellular topology, clustered into 8547 domain groups. SBASE B contains 169,916 domain sequences clustered into 2526 less well-characterized groups. Domain prediction is based on an evaluation of database search results in comparison with a 'similarity network' of inter-sequence similarity scores, using support vector machines trained on similarity search results of known domains.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15608182 PMCID: PMC540066 DOI: 10.1093/nar/gki112
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1Separation of domain group members from neighbors in three dimensions. The kringle group is one of the perfectly separated groups, WD repeat is one of the critical cases (Lhsp = length of HSP, Lsbj = length of subject (database entry), S/Sself = score coverage; see text for explanations).
SVM benchmark figuresa
| Domain group | No. of sequences | Match | Mismatch | Unpredicted | |
|---|---|---|---|---|---|
| Learning set | Test set | ||||
| Kringle domain | 24 | 9 | 9 | 0 | 0 |
| Fibronectin type III domain | 108 | 352 | 328 | 2 | 22 |
| WD repeat | 1924 | 673 | 542 | 12 | 125 |
| EGF-like domain | 87 | 290 | 262 | 13 | 15 |
| Protein kinase domain | 67 | 545 | 505 | 5 | 35 |
| Annexin repeat | 181 | 34 | 32 | 1 | 2 |
| Sushi domain | 80 | 119 | 103 | 0 | 3 |
| Trypsin family | 83 | 128 | 110 | 0 | 1 |
| Globin family | 79 | 59 | 57 | 1 | 0 |
| ABC transporter domain | 63 | 564 | 563 | 1 | 0 |
| Ank repeat | 1195 | 736 | 535 | 5 | 196 |
| Total | 128 780 | 60 457 | 56 891 | 238 | 3328 |
aThe learning set consisted of the parent protein sequences of domains in PFAM-SEED 8.0. The test included parent proteins with annotated domains not included in PFAM-SEED.
Figure 2Domain boundary prediction statistics available at the SBASE homepage.