| Literature DB >> 18948282 |
Ursula Pieper1, Narayanan Eswar, Ben M Webb, David Eramian, Libusha Kelly, David T Barkan, Hannah Carter, Parminder Mankoo, Rachel Karchin, Marc A Marti-Renom, Fred P Davis, Andrej Sali.
Abstract
MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/).Entities:
Mesh:
Substances:
Year: 2008 PMID: 18948282 PMCID: PMC2686492 DOI: 10.1093/nar/gkn791
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
MODBASE datasets
| Dataset/Project | Taxonomy ID | No. of Transcripts | No. of Sequences modeled | No. of Models | Sequence source |
|---|---|---|---|---|---|
| Genomes (*genomes for the TDI) | |||||
| Archaea | |||||
| 2234 | 2409 | 1794 | 3980 | NCBI | |
| 2190 | 1785 | 1480 | 1707 | NCBI | |
| 160 232 | 536 | 447 | 496 | NCBI | |
| 82 076 | 1535 | 1260 | 2902 | NCBI | |
| 13 773 | 2600 | 1566 | 3497 | NCBI | |
| 2261 | 2113 | 1524 | 3373 | NCBI | |
| 2287 | 2922 | 2006 | 4451 | NCBI | |
| 50 339 | 1497 | 1204 | 2806 | NCBI | |
| 1480 | 1220 | 2801 | NCBI | ||
| Bacteria | |||||
| 1423 | 4105 | 3374 | 9245 | NCBI | |
| 13 373 | 4798 | 3910 | 23 219 | NCBI | |
| 1513 | 2413 | 2158 | 5864 | NCBI | |
| 562 | 4206 | 3150 | 5994 | NCBI | |
| 1769 | 1605 | 1178 | 2493 | OrthoMCL-DB | |
| 1773 | 3991 | 2808 | 5913 | TubercuList | |
| 2104 | 687 | 426 | 857 | NCBI | |
| 287 | 5559 | 3806 | 9222 | NCBI | |
| 782 | 835 | 754 | 2136 | NCBI | |
| 282 458 | 2635 | 1184 | 3161 | NCBI | |
| 1314 | 1691 | 1440 | 3984 | NCBI | |
| 953 | 805 | 621 | 1873 | TIGR | |
| 632 | 3882 | 3215 | 8371 | NCBI | |
| Eukaryota | |||||
| 3702 | 30 707 | 23 807 | 70 494 | ENSEMBL | |
| 6279 | 11 397 | 7850 | 23 219 | TIGR | |
| 6239 | 22 698 | 18 996 | 52 235 | NCBI | |
| 9615 | 30 264 | 22 614 | 65 617 | ENSEMBL | |
| 237 895 | 3886 | 1614 | 3287 | CryptoDB | |
| 5807 | 3806 | 1918 | 3969 | CryptoDB | |
| Calculation in progress | ENSEMBL | ||||
| 7227 | 17 104 | 9381 | 24 683 | NCBI | |
| 9606 | 32 010 | 21 270 | 51 084 | OrthoMCL-DB | |
| 5664 | 8274 | 3975 | 8285 | GeneDB | |
| 10 090 | 30 133 | 25 338 | 70 783 | NCBI | |
| Calculation in progress | ENSEMBL | ||||
| 5833 | 5363 | 2599 | 5053 | PlasmoDB | |
| 5855 | 5342 | 2359 | 4670 | PlasmoDB | |
| Calculation in progress | ENSEMBL | ||||
| 4932 | 6600 | 3035 | 5543 | NCBI | |
| 6183 | 25 304 | 8576 | 26 076 | GeneDB | |
| 5811 | 7793 | 1530 | 3064 | ToxoDB | |
| 5691 | 9210 | 3900 | 8054 | GeneDB | |
| 5693 | 19 607 | 7390 | 14 858 | GeneDB | |
| 8355 | 27 952 | 25 457 | 69 191 | NCBI | |
| Selected projects | |||||
| CSMP datasets | 195 235 | 184 139 | 690 255 | GENPEPT NR | |
| NYSGXRC datasets | 553 537 | 493 672 | 1 415 237 | GENPEPT NR | |
| Enzyme Specificity Project | 15 833 | 10 875 | 183 591 | SFLD/NR | |
| ABC Transporter | 152 | 85 | 85 | ||
| GPCR | 11 586 | 11 551 | 24 272 | ||
| UNIPROT Datasets 2005 | 1 742 816 | 1 025 196 | 2 146 830 | UNIPROT | |
| Total (including other datasets) | 2 608 987 | 1 593 209 | 5 152 695 |
The sequences were retrieved from ENSEMBL (36), TIGR (50), NCBI-Genbank (6), OrthoMCL-DB (51), TubercuList (52), CryptoDB (53), GeneDB (54), ToxoDB (55), SFLD (56) and UniProt (34).
Figure 1.MODBASE Model Details page (Example Q9NP58 from the human genome dataset): this page provides links to all models for this specific sequence. A ribbon diagram of the primary model, database annotations and modeling details are displayed. Links to additional models for different target regions or models from other datasets are displayed as thumbprints. The pull-down menu provides access to alternative MODBASE views and other types of information (if available), such as data about mutations and putative ligand binding sites. The cross-references section contains links to relevant internal and external databases. For this particular sequence, mutation data are available from LS-Mut, LS-SNP and ABC SNPs.