Literature DB >> 34718740

RNAPhaSep: a resource of RNAs undergoing phase separation.

Haibo Zhu1,2, Hao Fu1,2, Tianyu Cui3, Lin Ning3, Huaguo Shao1,4, Yehan Guo5, Yanting Ke5, Jiayi Zheng5, Hongyan Lin5, Xin Wu5,6, Guanghao Liu5,6, Jun He5,6, Xin Han4, Wenlin Li2,7, Xiaoyang Zhao8, Huasong Lu9, Dong Wang3, Kongfa Hu2, Xiaopei Shen1,5,6.   

Abstract

Liquid-liquid phase separation (LLPS) partitions cellular contents, underlies the formation of membraneless organelles and plays essential biological roles. To date, most of the research on LLPS has focused on proteins, especially RNA-binding proteins. However, accumulating evidence has demonstrated that RNAs can also function as 'scaffolds' and play essential roles in seeding or nucleating the formation of granules. To better utilize the knowledge dispersed in published literature, we here introduce RNAPhaSep (http://www.rnaphasep.cn), a manually curated database of RNAs undergoing LLPS. It contains 1113 entries with experimentally validated RNA self-assembly or RNA and protein co-involved phase separation events. RNAPhaSep contains various types of information, including RNA information, protein information, phase separation experiment information and integrated annotation from multiple databases. RNAPhaSep provides a valuable resource for exploring the relationship between RNA properties and phase behaviour, and may further enhance our comprehensive understanding of LLPS in cellular functions and human diseases.
© The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2022        PMID: 34718740      PMCID: PMC8728120          DOI: 10.1093/nar/gkab985

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

In addition to canonical membrane-bound organelles, eukaryotic cells contain numerous membraneless organelles (MLOs) that concentrate specific collections of proteins and nucleic acids (1,2). Liquid-liquid phase separation (LLPS), a phenomenon that describes the formation of two immiscible fluids from a single homogeneous mixture, has emerged as a general mechanism to interpret how cells can spatiotemporally create MLOs (3–5). To date, a large number of MLOs have been discovered, including but not limited to stress granules, P bodies and even the nucleolus (6,7). These MLOs have been implicated in a wide range of cellular functions, organizing molecules that act in processes ranging from RNA metabolism to signalling to gene regulation (8–10). Moreover, aberrant MLO behaviours have been linked with multiple human diseases, such as neurodegeneration and cancer (11–13). Identifying the molecules driving or undergoing LLPS is the foundation of understanding the mechanisms of MLOs. Many, but not all, phase-separated biological condensates arise from proteins and RNAs (14). Many nuclear and cytoplasmic condensates are rich in RNAs and RNA-binding proteins (RBPs), which play roles in LLPS (15,16). The roles of proteins in condensation have been well studied (17). However, less attention has been paid to the contribution of RNA to LLPS (18). As an anionic polymer, RNA is an excellent platform for achieving multivalency and accommodating RBPs (19,20). Peptides in condensates usually contain low-complexity domains (LCDs) or intrinsically disordered regions (IDRs) those enabling weak and multivalent interactions to promote liquid-like properties, and a similar role for unstructured sequences in RNAs potentially contribute to RNA-driven phase separation (21,22). Recent findings have confirmed that RNAs can function as ‘scaffolds’ and play essential roles in seeding or nucleating the formation of MLOs (23,24). Experimental evidence has demonstrated that RNA self-assembly contributes to stress granule formation and defines the stress granule transcriptome (25). Furthermore, apart from driving LLPS, diverse RNA properties, such as composition, length, structure, modification and expression level, can modulate the biophysical features of native condensates, including their size, shape, viscosity, liquidity and surface tension (26–28). The molecular mechanisms under these associations remain unknown (14). Over the last five years, a large number of publications have reported cases of RNAs being involved in LLPS, and the number is still increasing rapidly. Five databases focused on proteins undergoing LLPS have been created, including LLPSDB, PhaSePro, PhaSepDB, DrLLPS and RNAGranuleDB (29–33). However, centralized resources on newly reported phase separation related RNAs are still lacking. In particular, RNA self-assembly LLPS events have not been recorded in any published phase separation database. To fill this gap, here we introduce RNAPhaSep, a database using RNA as its core and focusing on various properties of RNAs and their roles in the phase separation process. After careful manual collation, a total of 1113 entries of experimentally validated RNA self-assembly or RNA and protein co-involved phase separation events were included in RNAPhaSep. Our RNAPhaSep database provides a convenient interface to help users browse, search and download RNA-related phase separation entries.

DATA COLLECTION AND PROCESSING

RNAPhaSep was constructed based on the curated information derived from published literature and the RNALocate database (34). Literature mining was performed via retrieving from PubMed using the following keywords: ((((phase transition) OR (phase separation) OR (membraneless organelles) OR (biomolecular condensates)) AND RNA) AND cell). A total of 4,804 publications before 30 June 2021, were extracted (Figure 1A). For review articles, we read through each manuscript, extracted sentences describing the RNA-related phase separation events and downloaded the corresponding research articles for curation. As some of the known MLOs, such as P-bodies, had been extensively studied before the emergence of the LLPS concept, which underlies MLOs formation, the RNALocate database (34), which contains subcellular localization of RNA in MLOs, was used (Figure 1A). For each RNA record related to MLOs in RNALocate (34), an original research article was extracted by PMID. To obtain relevant publications that describe MLOs or phase separation and related RNAs, we manually checked the abstracts or full texts of all these articles. During the curation process, we sought to collect phase separation-associated RNAs and as much helpful information as possible, such as original supporting sentences, RNA sequence, mutation, phase separation experimental conditions, phase diagrams, compositions and corresponding cell lines or tissues used for experimentation (Figure 1B).
Figure 1.

Overview of RNAPhaSep database. (A) Workflow of building RNAPhaSep. First, we searched the literature in PubMed by keywords or PMIDs extracted from MLO-related RNA in the RNALocate database, and collected 1113 RNA-protein and RNA self-assembly phase separation events. Then we integrated annotations in 15 public data resources and presented structured and visualized data through web architecture and database servers. (B) The data structure of RNAPhaSep. Different types of information are distinguished by colour.

Overview of RNAPhaSep database. (A) Workflow of building RNAPhaSep. First, we searched the literature in PubMed by keywords or PMIDs extracted from MLO-related RNA in the RNALocate database, and collected 1113 RNA-protein and RNA self-assembly phase separation events. Then we integrated annotations in 15 public data resources and presented structured and visualized data through web architecture and database servers. (B) The data structure of RNAPhaSep. Different types of information are distinguished by colour. RNAPhaSep integrates two types of RNA IDs, including NCBI Gene IDs (35) and RNAcentral identifiers (36). For several rRNAs and snoRNAs, which could not be found in NCBI, an RNAcentral identifier was supplied. The graph of each RNA’s structure was generated using the RNAfold server on ViennaRNA web services (37). The LCD and IDR information for each protein were collected from MobiDB (38) or PONDR (39). The molecular properties of RNAs are essential for understanding their potential phase behaviour. For each natural RNA, the description in NCBI (35), molecular function in Gene Ontology (GO) (40), subcellular localization in RNALocate (34), interaction neighbour in RNAinter (41) and associated disease in DisGeNET (42) and OMIM (43) were all integrated as RNA annotation information (Figure 1B). Designed RNAs were divided into four subclasses based on sequence characteristics. In order of preference, there were: ‘poly RNA’ if the sequence was just a repetition of a single nucleotide, ‘repeat RNA’ if the sequence was the duplication of a fragment with at least two different nucleotides, ‘nucleotide rich RNA’ if the sequence was enriched with two specific types of nucleotides; and the rest are classified as ‘irregular RNA’. It is important to emphasise that RNAPhaSep is concerned with cases where the RNA itself or together with other components (DNA or protein) was experimentally validated in vitro or in vivo for LLPS. Thus, the systems with only proteins or DNAs as the main components were excluded. Moreover, systems with the mixtures of RNA, such as total mRNA, were included, as RNA was the only component driving LLPS in these systems. After sorting out the records, we noticed no DNA and RNA co-involved reports, which may be due to the limited research in this area. We may include this type of LLPS event in the updated version of RNAPhaSep. The state of phase separation can vary dynamically in a wide range, from liquid to solid. Four states, including ‘solute’, ‘liquid’, ‘gel’ and ‘solid’, were used to define the morphological characteristics of phase separation (1). Changes in experimental conditions can lead to a phase transition, such as from liquid to gel, then ‘liquid, gel’ was recorded as the morphology of this phase separation event. RNA-related phase separation events curated in RNAPhaSep were verified by experiments, including reconstituting LLPS condensation in vitro and examining droplet formation by immunofluorescence in vivo. RNAs detected by high-throughput methods were excluded.

DATA CONTENT

As of August 2021, RNAPhaSep included 1113 curated entries about RNA self-assembly or RNA and protein co-involved phase separation events, involving 325 non-redundant RNAs of 22 organisms (Figure 2A). We consolidated the entries with RNA or RNA plus protein names to reduce the data redundancy and assigned all entries to 628 unique RNAPSIDs. RNA properties such as composition, species, classification, sequence, length, structure, subcellular localization, RNA interaction neighbours, related molecular functions and diseases were collected and organized for each RNA.
Figure 2.

Data statistics on RNAPhaSep. (A) Organism distribution and (B) category distribution of natural RNAs. (C) Category distribution of designed RNAs. (D) Distribution of in vivo and in vitro experiments (inner circle) and composition distribution in vitro (outer ring) and (E) distribution of morphology for in vitro and in vivo LLPS events (outer ring). The ‘multiple states’ means that the phase separation event contains at least two states like ‘liquid, gel’. (F) Percentage of ribonucleotide and (G) distribution of length for RNAs from in vitro experiments.

Data statistics on RNAPhaSep. (A) Organism distribution and (B) category distribution of natural RNAs. (C) Category distribution of designed RNAs. (D) Distribution of in vivo and in vitro experiments (inner circle) and composition distribution in vitro (outer ring) and (E) distribution of morphology for in vitro and in vivo LLPS events (outer ring). The ‘multiple states’ means that the phase separation event contains at least two states like ‘liquid, gel’. (F) Percentage of ribonucleotide and (G) distribution of length for RNAs from in vitro experiments. For different RNA types, we classified RNAs into natural and designed RNA (Figure 2B and C). Although natural RNAs often have diverse annotation information, which could comprehensively describe their molecular functions in the cell, the impact of RNA sequence on phase separation events is more clearly demonstrated by designed RNAs due to their designability and low complexity. In vitro experiments are very important as their simplified processes can help researchers clearly identify the conditions involved in phase separation. Researchers can simulate intracellular phase separation events by controlling various experimental details, such as salt concentration, buffer, temperature, and pH. Most importantly, the components involved in these experiments are known, so for in vitro experiments, we classified entries as ‘RNA(s)’, ‘RNA + protein(s)’ and ‘RNA(s) + protein(s)’ (Figure 2D). The morphological distribution of phase-separated records was demonstrated in Figure 2E. The RNA sequences of in vivo records, mostly from the annotation in NCBI (35) or RNACentral (36), were not validated in the in vivo phase separation experiment. Thus, 184 RNAs from in vitro records were used for sequence and length analysis. The sequence analysis demonstrated that LLPS related RNAs’ sequences were enriched with adenine and uracil, which together accounted for 58.2%, 58%, 67.5% for all RNA, natural RNA and designed RNA (Figure 2F). Motif analysis of the RNA sequence by STREME (44) was performed to discover common sequence elements. However, we have not found any significant motifs, which may be due to a relatively small number of RNAs. Designed RNA favours shorter RNA sequences than natural RNA in sequence length distribution (Figure 2G). A novel coronavirus, SARS-CoV-2, has caused the ongoing worldwide COVID-19 pandemic. Scientists have found that viral genomic RNAs can form phase-separated droplets with nucleocapsid proteins and that these droplets become solid-like structures as the RNA length increases (45,46). Thus, this LLPS morphology is utterly dependent on the length and concentration of the given RNAs. To date, 76 phase-separated entries of SARS-CoV-2 are present in the RNAPhaSep database.

WEB INTERFACE

A user-friendly and fully functional website has been developed for searching, browsing and downloading RNA-related phase separation data. This database includes eight modules, including Home, Search/Blast, Browse, Submit, Download, Statistics, Help and Contact modules. For the convenience of the user for searching the data, RNAPhaSep provides two different searching ways on the Search/Blast page, including ‘By options’ (search by the combination of keywords, component type, species and RNA type) and ‘By RNA sequence’ (search by inputting RNA sequence) (Figure 3A). For ‘Search by options’, we have provided three typical examples. By clicking on the example button, the options information is automatically applied, and then by clicking on the ‘search’ button, the ‘Search Result’ is presented in a table format. The ‘Search by RNA sequence’ module enables users to identify the sequence similarity between their target RNA and the LLPS-related RNA stored in the database. The ‘Search/Blast’ module can help the user quickly screen how their interested RNA contributes to LLPS under available conditions with or without any partners.
Figure 3.

An illustration of the RNAPhaSep website. (A) Search the database by options (like RNA symbol, protein name) or RNA sequence in the Search/Blast page. (B) Dataset can be browsed by two distinct classifications (In vivo/vitro or RNA type) in the browse page. RNAPSID is provided in the search result and browse page for jumping to (C) phase separation detail page consisting of three main modules (RNA information, protein information and phase separation experiment and description), including visualization of an RNA sequence, structure and a Phase diagram. (D) RNA annotation page contains comprehensive information annotated from public databases and visualization of the RNA interaction network.

An illustration of the RNAPhaSep website. (A) Search the database by options (like RNA symbol, protein name) or RNA sequence in the Search/Blast page. (B) Dataset can be browsed by two distinct classifications (In vivo/vitro or RNA type) in the browse page. RNAPSID is provided in the search result and browse page for jumping to (C) phase separation detail page consisting of three main modules (RNA information, protein information and phase separation experiment and description), including visualization of an RNA sequence, structure and a Phase diagram. (D) RNA annotation page contains comprehensive information annotated from public databases and visualization of the RNA interaction network. The data sources are briefly described on the ‘Browse’ page, which has a table containing all entries that can be divided into different subsets by in vivo/vitro and RNA type (Figure 3B). Users can click on a unique RNAPSID to navigate to the ‘Phase Separation Details Page’, which includes various descriptions of the involved RNA, protein and experimental condition details (Figure 3C). For RNA, the phase separation-related RNA sequence and structure were demonstrated on the page. The corresponding Uniprot ID, IDR, LCD, mutation, modification, and sequence information are listed for each involved protein, if available. For each involved protein, in order to allow users to easily obtain a wide range of information on protein-related phase separation, we provided direct links to the protein detail page of known protein-related LLPS databases, including LLPSDB, PhaSePro and DrLLPS (29–30,32). As PhaSepDB did not provide the unique link for each protein, the link to its ‘Browse Page’ was supplied (31). The same protein involved in different LLPS experiments may have different sequences or structures, such as wild-type, mutant or post-transcriptional modified. These differences potentially affect the LLPS conditions. Thus, the ‘State’ of the protein is used to discriminate different protein sequences or structures and corresponding experimental conditions (Figure 3C). For each LLPS experiment, a phase diagram is presented in a graphical form, if available. For the natural RNAs, which have extensive information, clicking on the RNA symbol will take the user to the RNA annotation page, which integrates RNA description (35), molecular function (40), subcellular location (34), RNA interaction network (top 100 extracted from RNAinter) (41) and associated diseases (42–43) (Figure 3D). A detailed tutorial for the usage of the database can be found on the ‘Help’ page. On the ‘Submission’ page, we supplied three choices for different data sources, including published, preprint and unpublished data. The submission of published or preprint data will be routinely reviewed manually and formatted into database form with PMID or tagged as ‘preprint’. For the unpublished data, authors may not want to share the original experiment evidence, and we will tag the data by ‘unpublished’. We added the reminder on the phase separation page that ‘Entries with ‘unpublished’ tag have relatively lower reliability’.

Database implementation

The database is stored in the MySQL v5.7 (https://www.mysql.com) database engine. The web framework was constructed on Django v3.2 (https://www.djangoproject.com) and ran on a CentOS Linux operating system server. We have tested it on Google Chrome, Mozilla Firefox, Microsoft Edge and Apple Safari browsers. The RNAPhaSep database is freely available to the research community online at http://www.rnaphasep.cn.

CONCLUSION

Here, we present a novel resource on RNA-related phase separation, RNAPhaSep, generated from information obtained from the literature and RNALocate database. It contains 1113 experimentally validated RNA self-assembly or RNA and protein co-involved phase separation events, helping and guiding researchers to perform further studies related to LLPS. RNAPhaSep was designed explicitly for RNA-related phase separation, and we believe it will be a handy tool for researchers in this field. From recently published perspectives and review articles, we noticed that the role of RNA in driving phase separation has attracted increasing researchers’ attention and interest. The establishment of RNAPhaSep is only the first step, and we will continue to expand and improve it to satisfy more requirements in this field. We are now collecting a reliable LLPS corpus for developing a text mining system, which can automatically extract LLPS information from biomedical literature in PubMed. To ensure that all data from literature has consistently high reliability, these automatically extracted records will still need to be manually curated. After applying this system, we can update RNAPhaSep more frequently.
  46 in total

Review 1.  RNA contributions to the form and function of biomolecular condensates.

Authors:  Christine Roden; Amy S Gladfelter
Journal:  Nat Rev Mol Cell Biol       Date:  2020-07-06       Impact factor: 94.444

Review 2.  The molecular language of membraneless organelles.

Authors:  Edward Gomes; James Shorter
Journal:  J Biol Chem       Date:  2018-07-25       Impact factor: 5.157

Review 3.  Protein Phase Separation: A New Phase in Cell Biology.

Authors:  Steven Boeynaems; Simon Alberti; Nicolas L Fawzi; Tanja Mittag; Magdalini Polymenidou; Frederic Rousseau; Joost Schymkowitz; James Shorter; Benjamin Wolozin; Ludo Van Den Bosch; Peter Tompa; Monika Fuxreiter
Journal:  Trends Cell Biol       Date:  2018-03-27       Impact factor: 20.808

4.  STREME: Accurate and versatile sequence motif discovery.

Authors:  Timothy L Bailey
Journal:  Bioinformatics       Date:  2021-03-24       Impact factor: 6.937

Review 5.  Dynamic transcriptomic m6A decoration: writers, erasers, readers and functions in RNA metabolism.

Authors:  Ying Yang; Phillip J Hsu; Yu-Sheng Chen; Yun-Gui Yang
Journal:  Cell Res       Date:  2018-05-22       Impact factor: 25.617

6.  Genomic RNA Elements Drive Phase Separation of the SARS-CoV-2 Nucleocapsid.

Authors:  Christiane Iserman; Christine A Roden; Mark A Boerneke; Rachel S G Sealfon; Grace A McLaughlin; Irwin Jungreis; Ethan J Fritch; Yixuan J Hou; Joanne Ekena; Chase A Weidmann; Chandra L Theesfeld; Manolis Kellis; Olga G Troyanskaya; Ralph S Baric; Timothy P Sheahan; Kevin M Weeks; Amy S Gladfelter
Journal:  Mol Cell       Date:  2020-11-27       Impact factor: 17.970

7.  The Gene Ontology resource: enriching a GOld mine.

Authors: 
Journal:  Nucleic Acids Res       Date:  2021-01-08       Impact factor: 16.971

8.  RNALocate v2.0: an updated resource for RNA subcellular localization with increased coverage and annotation.

Authors:  Tianyu Cui; Yiying Dou; Puwen Tan; Zhen Ni; Tianyuan Liu; DuoLin Wang; Yan Huang; Kaican Cai; Xiaoyang Zhao; Dong Xu; Hao Lin; Dong Wang
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

9.  Free mRNA in excess upon polysome dissociation is a scaffold for protein multimerization to form stress granules.

Authors:  Ouissame Bounedjah; Bénédicte Desforges; Ting-Di Wu; Catherine Pioche-Durieu; Sergio Marco; Loic Hamon; Patrick A Curmi; Jean-Luc Guerquin-Kern; Olivier Piétrement; David Pastré
Journal:  Nucleic Acids Res       Date:  2014-07-10       Impact factor: 16.971

10.  LLPSDB: a database of proteins undergoing liquid-liquid phase separation in vitro.

Authors:  Qian Li; Xiaojun Peng; Yuanqing Li; Wenqin Tang; Jia'an Zhu; Jing Huang; Yifei Qi; Zhuqing Zhang
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

View more
  1 in total

1.  The 2022 Nucleic Acids Research database issue and the online molecular biology database collection.

Authors:  Daniel J Rigden; Xosé M Fernández
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.