| Literature DB >> 32947964 |
Qian Li1, Xi Wang1, Zhihui Dou1, Weishan Yang1, Beifang Huang1, Jizhong Lou1,2, Zhuqing Zhang1.
Abstract
Liquid-liquid phase separation (LLPS) of biomolecules, which underlies the formation of membraneless organelles (MLOs) or biomolecular condensates, has been investigated intensively in recent years. It contributes to the regulation of various physiological processes and related disease development. A rapidly increasing number of studies have recently focused on the biological functions, driving, and regulating mechanisms of LLPS in cells. Based on the mounting data generated in the investigations, six databases (LLPSDB, PhaSePro, PhaSepDB, DrLLPS, RNAgranuleDB, HUMAN CELL MAP) have been developed, which are designed directly based on LLPS studies or the component identification of MLOs. These resources are invaluable for a deeper understanding of the cellular function of biomolecular phase separation, as well as the development of phase-separating protein prediction and design. In this review, we compare the data contents, annotations, and organization of these databases, highlight their unique features, overlaps, and fundamental differences, and discuss their suitable applications.Entities:
Keywords: condensates; databases; liquid–liquid phase separation; membraneless organelles; protein
Year: 2020 PMID: 32947964 PMCID: PMC7555049 DOI: 10.3390/ijms21186796
Source DB: PubMed Journal: Int J Mol Sci ISSN: 1422-0067 Impact factor: 5.923
Figure 1Number of publications on protein LLPS investigation over the past twenty years (until the end of August in 2020). The retrieval was performed with the keyword combinations “((liquid−liquid phase separation) OR (liquid−liquid phase transition)) AND (protein)” from NCBI PubMed as well as Web of Science (inserted figure). The red arrows highlight Brangwynne CP and coworkers’ publication shown in 2009.
Overview of six databases related to liquid–liquid phase separation (LLPS).
| Databases | Organization | Data Contents | Data Sources | Outstanding Features | Availability | Ref. |
|---|---|---|---|---|---|---|
| LLPSDB | Entries are defined by specific protein and/or nucleic acid constructs. Protein type (natural, designed) Components type (protein(s), proteins(s) + RNA, protein(s) + DNA) Components number (one, two, more) | 273 proteins | Validated by LLPS experiments in vitro |
Including natural and designed proteins Provides exhaustive molecular modifications, including fusion, cleaved, mutation, repeat, and PTMs, that are detected experimentally for specific protein constructs Provides explicit phase separation conditions (environmental parameters) and more than 200 phase diagrams | [ | |
| PhaSePro | Entries are defined by specific proteins. | 121 proteins (109 from eukaryotes, 5 from bacteria, and 7 from viruses) | Validated by LLPS experiments in vitro and/or in vivo |
Provides LPS driver region(s) and molecular interaction types contributing to LLPS, validated by experiments for each protein Introduces LLPS-specific controlled vocabularies (CVs) to annotate the functional, molecular, and experimental information of each protein Provides a broader array of structural, functional, and disease information |
| [ |
| PhaSepDB | Entries are defined by specific proteins. Data sources (reviewed, UniProt reviewed, high throughput) Location and organelle (more than 30 MLOs) | 2914 proteins |
Validated by LLPS experiments Localized in membraneless compartments through UniPort review and high throughput experimental validation |
Entries can be browsed through specific MLO locations in the form of graphical navigation on its home page Provides various bioinformatic analysis of the sequence properties such as PTMs, secondary structure distribution, the electrostatic interaction, and hydrophobic residue distribution and displays the results by an easily interpreted per-residue plot Provides sequence analysis of other human proteins |
| [ |
| DrLLPS | Entries are defined by specific genes. Condensates (in vitro droplet, nucleus, cytoplasm, germ cell, Others) LLPS types (scaffold, regulator, client) Species (animals, plants, fungi) | 437,887 proteins in 164 eukaryotes |
Validated by experiments of LLPS or membraneless compartments identification Identified computationally via the protein sequence blast |
Holds the largest amount of data Includes the most comprehensive structure-related annotations from 110 public resources covering 16 aspects |
| [ |
| RNAgranuleDB | Entries are defined by specific proteins. Experiment design (discovery-based approach, candidate-based approach) Evidence type (cell biological, physical, genetic) Specific assay or dataset | 4385 proteins | Localized in stress granule and P body, validated by experiments |
All proteins are categorized into 4 tiers weighted according to the degree of support it provides for protein residence in SGs or PBs. Proteins are analyzed by the prediction of six first-generation LLPS predictors Lacks detailed information on LLPS |
| [ |
| HUMAN CELL MAP | Entries are defined by specific genes. | 4145 proteins | Localized in membrane-bound or membraneless organelles through identification based on experiments combined with analysis. |
Summarizes for each compartment the enrichment of expected domains and motifs as well as GO-terms Provides channels to analyze spatiotemporal correlations between proteins in different organelles Lacks detailed information on LLPS | [ |
Figure 2Screenshots of some webpages of the six related databases. In each squared screenshot, the unique features of the corresponding database are shown in red font text within the ellipse region(s).
Overlapped protein numbers between the six databases related to LLPS. (The numbers of overlapped proteins between any two databases were obtained though “UniProt ID” except for RNAgranuleDB. For the overlapped proteins between RNAgranuleDB and other databases, “gene name” was used for comparison. The diagonal blue number shows the number of proteins deposited in each database (for DrLLPS, the potential orthologs were not included), which is somehow slightly different from that reported in the corresponding paper for PhaSepDB, DrLLPS, and RNAgranuleDB, probably due to correction after the databases’ release.))
| LLPSDB | PhaSePro | PhaSepDB | DrLLPS | RNAgranuleDB | HUMAN CELL MAP | |
|---|---|---|---|---|---|---|
|
|
| |||||
|
|
|
| ||||
|
|
|
|
| |||
|
|
|
|
|
| ||
|
|
|
|
|
|
| |
|
|
|
|
|
|
|
|