| Literature DB >> 35102880 |
Damiano Piovesan1, Alexander Miguel Monzon1, Federica Quaglia1, Silvio C E Tosatto1.
Abstract
Intrinsically disordered regions (IDRs) lacking a fixed three-dimensional protein structure are widespread and play a central role in cell regulation. Only a small fraction of IDRs have been functionally characterized, with heterogeneous experimental evidence that is largely buried in the literature. Predictions of IDRs are still difficult to estimate and are poorly characterized. Here, an overview of the publicly available knowledge about IDRs is reported, including manually curated resources, deposition databases and prediction repositories. The types, scopes and availability of the various resources are analyzed, and their complementarity and overlap are highlighted. The volume of information included and the relevance to the field of structural biology are compared. open access.Entities:
Keywords: databases; flexible proteins; intrinsically disordered proteins; protein ensembles
Mesh:
Substances:
Year: 2022 PMID: 35102880 PMCID: PMC8805306 DOI: 10.1107/S2059798321012109
Source DB: PubMed Journal: Acta Crystallogr D Struct Biol ISSN: 2059-7983 Impact factor: 7.652
Figure 1Overview of IDP/IDR data and the respective databases. The databases are organized according to the type of IDP/IDR data stored: sequence, binding regions and structural data. In the top part, examples are shown for each category using part of the N-terminal region of the human p53 protein (UniProtID p04637), its MDM2-binding short linear motif (ELM accession ELME000184 and PDB entry 1ycr, chain B, red) and the corresponding structural ensemble (PED ID PED00037e000). Curated databases are indicated with a check mark, deposition databases with a database icon and databases with predicted data with a machine-learning icon. Databases aggregating data from different sources have a light blue background. Created with BioRender (https://biorender.com/).
Manually curated intrinsic disorder databases
The name and URL are provided for each database. Creation date corresponds to the year of the first publication describing the resource. The numbers of proteins with intrinsically disordered regions (IDRs) and linear interacting peptides (LIPs) are reported based on the database websites. Notice that some databases provide only IDRs or LIPs. Data were collected in October 2021.
| IDRs | LIPs | |||||
|---|---|---|---|---|---|---|
| Name | URL | Creation date | Proteins | Content (%) | Proteins | Content (%) |
| Pfam |
| 1997 | 39 | >80 | — | — |
| UniProtKB |
| 2002 | 475 | 13.8 | — | — |
| ELM |
| 2003 | — | — | 3542 | 1.4 |
| DisProt |
| 2005 | 1746 | 20.5 | 729 | 19.3 |
| IDEAL |
| 2012 | 995 | 10.3 | 317 | 8.9 |
| FuzDB |
| 2016 | 110 | 16.6 | — | — |
| MFIB |
| 2017 | — | — | 205 | 24.7 |
| DIBS |
| 2017 | — | — | 772 | 4.1 |
Intrinsically disordered domain families available in Pfam release 34.
UniProtKB proteins are those with at least one manually curated IDR. The release year indicated is when the UniProtKB consortium (Swiss-Prot, TrEMBL and PIR) was launched; however, the release year of Swiss-Prot is 1986.
FuzDB annotates IDRs that form protein complexes retaining conformational heterogeneity.
Intrinsic disorder prediction databases
Columns are the same as in Table 1 ▸. The ‘Proteins’ column indicates the total number of database proteins, while the ‘Annotated’ columns indicate proteins with at least one IDR or LIP. ‘MobiDB curated’ includes data from the combined DisProt, IDEAL, FuzDB, ELM, UniProtKB, DIBS and MFIB databases. ‘MobiDB derived’ are missing residues in PDB structures. ‘MobiDB predicted’ lists IDRs predicted by MobiDB-lite and LIPs predicted by ANCHOR. IDR and LIP content are the fraction of annotated residues in proteins with at least one annotated region (MobiDB statistics release 2020_09). D2P2 provides only IDR annotations (as shown on the website) and statistics at the residue level are not available (NA). InterPro proteins are those matching with disordered Pfam domains. Disorder content is calculated based on the residues covered by Pfam models flagged as intrinsically disordered. Notice that AlphaFoldDB (queried on 26th July, 2021) is growing daily until it covers all UniRef90 proteins. Data were collected in October 2021.
| IDRs | LIPs | ||||||
|---|---|---|---|---|---|---|---|
| Name | URL | Creation date | Proteins | Annotated | Content (%) | Annotated | Content (%) |
| MobiDB curated |
| 2012 | NA | 2074 | 16.7 | 2871 | 5.8 |
| MobiDB derived |
| 2012 | NA | 35136 | 6.0 | 8979 | 5.8 |
| MobiDB predicted |
| 2012 | 189525031 | 187222768 | 12.1 | 111772244 | 10.5 |
|
|
| 2012 | 189525031 | 38542336 | 17.1 | — | — |
| D2P2 |
| 2014 | 10429761 | NA | NA | — | — |
| InterPro |
| 2001 | 219740214 | 233001 | 26.2 | — | — |
| AlphaFoldDB |
| 2021 | 362094 | NA | NA | — | — |
Deposition databases
Deposition databases containing primary data are listed by name and URL. Creation date corresponds to the year of the first publication describing the resource, and all are actively maintained. The number of records correspond to different depositions of a particular type of data (X-ray, NMR, SAXS, CD etc.). Notice that these databases are redundant, i.e. the same data can be deposited more than once. Manually curated IDR annotations are provided by SASBDB and PED. Data were collected in October 2021.
| Name | URL | Creation date | Records | Type of data |
|---|---|---|---|---|
| PDB |
| 1971 | 176247 | X-ray, NMR, cryo-EM |
| BMRB |
| 1989 | 14254 | NMR chemical shifts |
| PCDDB |
| 2006 | 697 | Circular dichroism |
| SASBDB |
| 2014 | 1942 | Small-angle scattering |
| PED |
| 2014 | 152 | Integrative modeling (disordered ensembles) |
| PDB-Dev |
| 2018 | 58 | Integrative modeling (structured proteins) |
LLPS databases
The name and URL are provided for manually curated LLPS databases. Creation date corresponds to the year of the first publication describing the resource. All databases were developed in the last two years and are currently being maintained. The type of data represented by the total number of records is specified as each database has a different content. Data were collected in October 2021.
| Name | URL | Creation date | Records | Type of data |
|---|---|---|---|---|
| PhaSepDB |
| 2019 | 2957 | MLO localization/association |
| MloDisDB |
| 2021 | 771 | MLO localization/association and diseases |
| PhaSePro |
| 2019 | 121 | Drivers/scaffolds |
| LLPSDB |
| 2019 | 1175 | Experiments |
| DrLLPS |
| 2019 | 9285 | Clients, regulators, drivers/scaffolds |