| Literature DB >> 18710497 |
Svetlana Kirillova1, Oliviero Carugo.
Abstract
BACKGROUND: Accurate and fast tools for comparing protein three-dimensional structures are necessary to scan and analyze large data sets.Entities:
Year: 2008 PMID: 18710497 PMCID: PMC2535597 DOI: 10.1186/1756-0500-1-44
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
The content of the datasets and the query lists used for PRIDE testing
| Dataset | Number of domains in the dataset | Number of histograms used for the domain structure representation | Number of domains in the query list | ||||||
| E* | D** | Total | |||||||
| 1 | 29 098 | > 30 | 24 | 25 | 25 | 25 | 25 | 25 | 149 |
| 2 | 4 937 | 10 – 30 | 6 | 6 | 6 | 8 | 8 | 8 | 42 |
*E corresponds to the "easy" cases when the queries belong to highly populated groups of investigated datasets containing at least 50 domains at the homologous superfamily classification level of CATH;
**D corresponds to the "difficult cases" when queries belonged to small groups having no more than 3 domains at the homologous superfamily classification level of CATH
Figure 1ROC curves. The solid line shows a ROC curve obtained by comparing 149 CATH domains with 29 098 CATH entries of the first dataset of Table 1 that contains large protein domains; the dashed line represents a ROC curve calculated for the 42 small CATH domains and 4 937 CATH entries of the second dataset of Table 1, containing small protein domains.