| Literature DB >> 27899672 |
Stefan Bienert1,2, Andrew Waterhouse1,2, Tjaart A P de Beer1,2, Gerardo Tauriello1,2, Gabriel Studer1,2, Lorenza Bordoli1,2, Torsten Schwede3,2.
Abstract
SWISS-MODEL Repository (SMR) is a database of annotated 3D protein structure models generated by the automated SWISS-MODEL homology modeling pipeline. It currently holds >400 000 high quality models covering almost 20% of Swiss-Prot/UniProtKB entries. In this manuscript, we provide an update of features and functionalities which have been implemented recently. We address improvements in target coverage, model quality estimates, functional annotations and improved in-page visualization. We also introduce a new update concept which includes regular updates of an expanded set of core organism models and UniProtKB-based targets, complemented by user-driven on-demand update of individual models. With the new release of the modeling pipeline, SMR has implemented a REST-API and adopted an open licencing model for accessing model coordinates, thus enabling bulk download for groups of targets fostering re-use of models in other contexts. SMR can be accessed at https://swissmodel.expasy.org/repository.Entities:
Mesh:
Substances:
Year: 2016 PMID: 27899672 PMCID: PMC5210589 DOI: 10.1093/nar/gkw1132
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Structural coverage of the human proteome. The plot illustrates the development of structural information for the amino acids of the Homo sapiens reference proteome residues (y-axis) over time (adopted from (1)). Profiles were generated for each protein sequence in the reference data set based on the NR20 database and used to search the list of protein sequences in PDB using HHblits (16). For each residue, the highest sequence identity for any alignment to an experimental structure available in a given year was recorded. Different colors in the plot represent the quality of the sequence alignment between the reference proteome sequences (targets) and the sequences of the protein structure database (templates). Alignments with low sequence identity are displayed in light blue, whereas alignments with high sequence identity are depicted in dark blue.
Figure 2.The SWISS-MODEL Repository web page for UniProtKB entry Q18953 (O-phosphoseryl-tRNA selenium transferase). The circular representation in section 1 shows the coverage of the target sequence with models and experimental structures. Protein features are annotated on the outside of the arc, in this specific case one InterPro domain and a site annotation. Details about the selected model are shown above the model quality plots in section 2. A Sequence Features drop-down menu reveals a detailed list of feature annotations from UniProt which can be mapped interactively on the model. In section 3, the four chains of the homo-4-mer are highlighted on the 3D model (displayed in PV). The Alignment, Summary, and Colour configuration sections are directly below the protein arc in section 4. The example shown here represents the status of the database at the time of writing and may change over time, e.g. when a new template or better model becomes available.
Statistics for each of the core species’ canonical sequence sets.
| Species | # of sequences in ref. proteome | # of sequences with >1 model | ≥ 80% | ≥60% | ≥40% | ≥20% | <20% | No template |
|---|---|---|---|---|---|---|---|---|
| 21 006 | 15 195 | 5010 | 3299 | 2673 | 2648 | 1965 | 5411 | |
| 22 274 | 16 860 | 6331 | 3337 | 2789 | 2765 | 1638 | 5414 | |
| 20 071 | 10 566 | 3489 | 1917 | 1852 | 2026 | 1282 | 9505 | |
| 4306 | 3306 | 2620 | 274 | 211 | 132 | 69 | 1000 | |
| 27 252 | 17 132 | 6544 | 3335 | 2772 | 3004 | 1477 | 10 120 | |
| 13 704 | 8502 | 2956 | 1488 | 1309 | 1525 | 1224 | 5202 | |
| 6721 | 4101 | 1665 | 590 | 550 | 751 | 545 | 2620 | |
| 3715 | 2633 | 1914 | 303 | 191 | 154 | 71 | 1082 | |
| 3987 | 2921 | 2000 | 338 | 275 | 205 | 103 | 1066 | |
| 5550 | 4270 | 3271 | 405 | 311 | 199 | 84 | 1280 | |
| 2881 | 1925 | 1500 | 170 | 117 | 94 | 44 | 956 | |
| 5340 | 2781 | 722 | 352 | 382 | 535 | 790 | 2559 |
For each species we show the total number of canonical sequences in the reference proteome (according to UniProtKB), the number of sequences for which we have at least one model, followed by the number of sequences that have models that cover at least 80% (60%, 40%, etc.) of the respective reference sequence.