Literature DB >> 30445601

RNAct: Protein-RNA interaction predictions for model organisms with supporting experimental data.

Benjamin Lang¹, Alexandros Armaos¹, Gian G Tartaglia^1,2,3,4.

Abstract

Protein-RNA interactions are implicated in a number of physiological roles as well as diseases, with molecular mechanisms ranging from defects in RNA splicing, localization and translation to the formation of aggregates. Currently, ∼1400 human proteins have experimental evidence of RNA-binding activity. However, only ∼250 of these proteins currently have experimental data on their target RNAs from various sequencing-based methods such as eCLIP. To bridge this gap, we used an established, computationally expensive protein-RNA interaction prediction method, catRAPID, to populate a large database, RNAct. RNAct allows easy lookup of known and predicted interactions and enables global views of the human, mouse and yeast protein-RNA interactomes, expanding them in a genome-wide manner far beyond experimental data (http://rnact.crg.eu).

Entities: CellLine Disease Gene Species

Mesh：

Substances：
RNA-Binding Proteins
RNA

Year: 2019 PMID： 30445601 PMCID： PMC6324028 DOI： 10.1093/nar/gky967

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

RNA-binding proteins (RBPs) are key in RNA splicing, processing, export, localization and regulation of translation and are implicated in a number of pathologies in humans. Examples include heterogeneous and life-threatening genetic disorders, such as amyotrophic lateral sclerosis (1), spinocerebellar ataxia and retinitis pigmentosa, among others (2,3). Human proteins encoded by 1393 genes currently have experimental evidence of RNA-binding activity (4–6). These proteins contain one or more RNA-binding regions, either in the form of canonical globular domains or of more recently discovered, intrinsically disordered RNA interaction regions (7,8). Additionally, protein–protein interaction interfaces and enzymatic active sites are sometimes employed for RNA binding (4,9). Protein–RNA interactions form an intricate network, and RNAs play structural roles in many types of phase-separated biological condensates, such as stress granules (10). However, the number of RBPs for which the identity of their interaction partners is known is much lower. Two hundred fifty Homo sapiens RBPs currently have high-throughput experimental data on the identity of their target RNAs (11,12), obtained mostly by various sequencing-based methods such as eCLIP, iCLIP, HITS-CLIP, PAR-CLIP and RIP-seq. Much smaller datasets are available for Mus musculus (38 RBPs (12)), Drosophila melanogaster (29 RBPs from RIP-seq (13)) and Saccharomyces cerevisiae (69 RBPs from RIP-Chip (14)). A comprehensive collection of CLIP data is available in the recently expanded POSTAR database (12), previously called CLIPdb, which also includes motif-based target predictions for a set of human and mouse RBPs (88 and 82, respectively). To bridge the gap between the 1393 known RBPs and the 250 for which we have experimental knowledge of interaction partners, we used an established, experimentally validated (15,16) protein–RNA interaction prediction method, catRAPID (17–19), to generate proteome- and transcriptome-wide sets of interaction predictions. Our database now covers the H. sapiens, M. musculus and S. cerevisiae genomes and contains a total of 5.87 billion pairwise interactions. This reflects nearly 120 years of computation time on the Centre for Genomic Regulation's high-performance computing cluster, and for the first time provides all possible protein–RNA interactions in these species. RNAct makes available our genome-wide protein–RNA interaction predictions and combines them with powerful and intuitive search functionality, including pairwise search for sets of proteins and RNAs. The display is enriched with useful annotation, including transcript support level (TSL) and APPRIS classification for isoforms and RNA subcellular localization from the RNALocate database. Known RBPs as well as interactions confirmed by large-scale experiments from the ENCODE project are clearly highlighted.

MATERIALS AND METHODS

Proteomes

Proteomes were obtained from UniProt (20). Sequence files containing all canonical sequences from each organism’s reference proteome were obtained from the UniProt FTP server (these exclude the ‘additional’ isoform transcripts for a given UniProt accession). This resulted in successful interaction predictions for 20 778 canonical human proteins (proteome UP000005640 from UniProt release 2017_10), 22 080 canonical mouse proteins (proteome UP000000589 from UniProt 2018_01, strain C57BL/6J) and 5 963 canonical yeast proteins (proteome UP000002311 from UniProt 2018_06, strain ATCC 204508 / S288c).

Transcriptomes

Transcriptomes were obtained from GENCODE (for human and mouse) (21) and Ensembl (for yeast) (22). GENCODE ‘basic’ RNAs are a representative subset prioritizing full-length protein-coding transcripts over partial or non-coding transcripts for a given gene. The GENCODE release used for human is Release 27 (genome assembly GRCh38.p10), and both the ‘basic’ (98 608 transcripts with successful interaction predictions) and ‘non-basic’ (100 722 transcripts) subsets were obtained for full coverage of the human GENCODE transcriptome. These sets are kept separate for performance reasons, and the protein view currently does not show non-basic human RNAs (except in the pairwise search). For mouse, GENCODE release M16 (genome assembly GRCm38.p5) was used, retaining only the ‘basic’ subset (76 532 transcripts, ∼58% of the mouse GENCODE transcriptome) due to resource and computation time constraints. For yeast, all coding and non-coding transcripts from the Ensembl 92 release (April 2018) were included (7029 transcripts with successful interaction predictions). All FASTA sequence files used are available for download in the RNAct Download section. A small number of these sequences were excluded from RNAct due to limitations of the catRAPID algorithm: short or extreme length (proteins ≤50 aa or >14 507 aa, RNAs ≤50 nt or >28 227 nt), or unsuccessful RNA secondary structure prediction using the ViennaRNA package which catRAPID relies on (23).

Interaction predictions (catRAPID maximum fragment score)

To compute the interaction propensity scores, we used the catRAPID approach (17) with the fragmentation procedure (18,19) and normalized for sequence lengths (19). For each protein–RNA pair, the fragments with the maximum interaction propensity score are used to assess overall binding ability (Figure 1A). The catRAPID score shows a receiver operating characteristic (ROC) area under the curve (AUC) of 0.78 with high-confidence eCLIP data (212 256 interactions with human GENCODE ‘basic’ RNAs, replicated in at least one cell line studied in ENCODE and in all replicates in each).

Figure 1.

(A) Interaction propensity scores for the background (sampled from slightly over 2 billion human protein–RNA pairs; light red) and positive set (212 256 high-confidence protein–RNA interactions revealed by eCLIP; cyan). The z-score reported in the results pages is computed on the right-skewed blue distribution, with the solid cyan line indicating the mean and the dashed line indicating a z-score of 1 (one standard deviation above the mean). (B) The area under the ROC curve of 0.78 (0.72 upon length normalization) indicates the predictive performance of the catRAPID method on recent high-confidence experimental eCLIP data from the ENCODE project. When including all eCLIP interactions regardless of replication (723 881 interactions for GENCODE ‘basic’ RNAs), this AUC is still 0.76. Normalizing the prediction score by sequence lengths, similarly to a previous work (19), we found that the predictive performance decreases slightly (to an AUC of 0.71 on the high-confidence interactions, and of 0.70 on all). This indicates a size effect, potentially due to the RNAse digestion step in CLIP protocols. We stress that the method was trained on X-ray and NMR data, and that its performance on the experimental CLIP data reflects its predictive power (Figure 1B). RNAct displays the length-normalized prediction scores, with raw catRAPID scores available for download upon request.

Experimental interaction data (ENCODE eCLIP)

Experimental interaction data covering 119 human RBPs using eCLIP in the HepG2 and K562 cell lines (170 total experiments) were obtained from the ENCODE Project in narrowPeak format (11,24,25). This represents the largest single dataset of experimental protein–RNA interaction data currently available. Additional experimentally determined interactions covering 69 RBPs in yeast using RIP-Chip were obtained from a compilation by Mittal et al. (14).

Protein and RNA annotation

A very recent census of proteins with experimental evidence of RNA-binding activity in human (1393 known RBP genes), mouse (1914 known RBPs) and yeast (1273 known RBPs) was used to flag proteins as known RBPs in RNAct (4). Additionally, an older census of 1542 RBPs, which used features such as domain composition and known roles of proteins, was used to flag a further 658 human RNAct proteins as known RBPs (3). Overall, 5097 proteins in RNAct are flagged as ‘Known RBPs’, with 2031 of these being human. In addition to annotated, known RBPs, we obtained predictions of RNA-binding activity from SONAR (26) (1923 predicted human RBPs) and catRAPID signature (27). catRAPID signature was used with a threshold score of 0.735, equivalent to a z-normalized value of 1 (one standard deviation above the mean) for the score distribution for known human RBPs from Hentze et al. (4), resulting in 1268 predicted human RBPs. Overall, 2779 human proteins in RNAct are flagged as ‘Predicted RBPs’, 1721 of these being novel (not ‘known’). RNA subcellular localization was obtained from the RNALocate database with very minor curation, removing a handful of ambiguous or non-subcellular terms (28). Basic protein annotation including gene symbols, full protein names and sequence length was obtained from UniProt. RNA annotation including transcript symbols (e.g. ‘TARDBP-201’), length, biotype (e.g. ‘protein coding’, ‘lincRNA’), GENCODE ‘basic’ status and TSL were obtained from GENCODE and Ensembl. Principal (primary) and alternative isoform classifications were obtained from APPRIS (29).

Technical aspects

RNAct is implemented in PHP on an Apache server using a MariaDB SQL backend, storing ∼450 GB of pre-sorted tables. The interaction predictions were calculated over several months on a shared set of 80 HP BL460c nodes with two Intel Xeon E5-2680 2.70 GHz CPUs and 120 GB of usable DDR3-1600 memory each, using 8 cores per cluster job. These are part of the CRG’s high-performance computing cluster. The open-source Bootstrap library was used to ensure correct display on devices of any screen size, including mobile devices. Several icons were included from Font Awesome and the Noun Project (please see the About section of the website for attributions). RNAct collects no data on its users.

USING RNAct

Search functionality

RNAct is built for extreme ease and speed of real-world use. The landing page (Search) contains a single search box which allows entry of any protein or RNA identifier (e.g. ‘tdp43’ or ‘hotair’). Unless the term is highly ambiguous (e.g. ‘ataxin’), most searches resolve to a single gene symbol, giving a choice of species and protein or RNA on the disambiguation page that follows (Figure 2). Table 1 shows a list of realistic search terms that are resolved successfully by RNAct. This is achieved by ‘guessing’ the identifier type, moving outwards from specific to more ambiguous options, if necessary. There is no built-in limit to the number of search results returned, allowing searches for e.g. ‘RNA binding’, ’vault RNA’ or ‘lysine demethylase’.

Figure 2.

Search results (disambiguation page). This page allows selection of the protein or RNA of interest across the 3 species currently in RNAct.

Table 1.

Examples of realistic search terms successfully resolved by RNAct

Real-world search term	Retrieved gene symbol(s)	Retrieved description	Retrieved via
‘annexin 11’	ANXA11	Annexin A11	Partial description match
‘ews’	EWSR1	RNA-binding protein EWS	Gene symbol alias
ENSG00000089280	FUS	RNA-binding protein FUS	Ensembl gene identifier
FUS_MOUSE	FUS	RNA-binding protein FUS	UniProt identifier
P35637	FUS	RNA-binding protein FUS	UniProt accession
‘pur α’	PURA	Transcriptional activator protein Pur-α	Partial description match
‘smn’	SMN1	Survival motor neuron protein	Partial symbol match
	SMNDC1	Survival of motor neuron-related-splicing factor 30
‘tdp43’	TARDBP	TAR DNA-binding protein 43	Gene symbol alias, ignoring punctuation (via TDP-43)

Search results (disambiguation page). This page allows selection of the protein or RNA of interest across the 3 species currently in RNAct. The Protein view. This page shows a list of potential RNA interaction partners prioritized by catRAPID length-normalized prediction score. Alternatively, the page can be sorted by eCLIP experimental results by clicking on the ‘P-value’ or ‘fold change’ columns. Useful information on the protein of interest, such as whether it is a known or predicted RBP and whether experimental interaction data (e.g. from eCLIP experiments) exists for it is shown at the top of this view, and transcript annotation and quality information are shown as badges for each RNA. Links out to Ensembl and UniProt are provided. Other links lead to the protein’s or RNA’s view within RNAct. Examples of realistic search terms successfully resolved by RNAct This design minimizes tedious input elements (e.g. a species dropdown box) and instead facilitates discovery and comparison across protein families and species. Matching fields are highlighted in green, which allows intuitive selection of the intended match (e.g. the RNA transcript in question when searching for ‘ENST00000237536’) while leaving room for additional useful choices (e.g. the corresponding protein for transcript ‘ENST00000237536’). The search box is available in the top right of every page and is easily navigated to by pressing the tab key.

Protein view

Once a protein of interest is selected, the Protein view (Figure 3) shows a list of RNA interaction partners prioritized by prediction score. Alternatively, the view can be sorted by experimental results simply by clicking on the experimental columns. The length, GENCODE ‘basic’ status, APPRIS classification and TSL (22) for each transcript are shown, allowing isoform quality assessment. Links out to Ensembl and UniProt for additional transcript and protein information respectively are provided (with an arrow symbol).

Figure 3.

The Protein view. This page shows a list of potential RNA interaction partners prioritized by catRAPID length-normalized prediction score. Alternatively, the page can be sorted by eCLIP experimental results by clicking on the ‘P-value’ or ‘fold change’ columns. Useful information on the protein of interest, such as whether it is a known or predicted RBP and whether experimental interaction data (e.g. from eCLIP experiments) exists for it is shown at the top of this view, and transcript annotation and quality information are shown as badges for each RNA. Links out to Ensembl and UniProt are provided. Other links lead to the protein’s or RNA’s view within RNAct.

RNA view

Once an RNA is selected, the RNA view shows a list of predicted protein interaction partners prioritized by prediction score. Alternatively, the view can be sorted by experimental results simply by clicking on the experimental columns. Interactions with experimental evidence are highlighted (14,24), as are known (3,4) and predicted (26,27) RBPs. Links out to Ensembl and UniProt for additional information are provided.

Advanced pairwise search

A common use case for RNAct is the prediction of interactions within a set of proteins and RNAs, allowing the rapid prioritization of candidates for validation, and the analysis of specific pathways or systems. The Pairwise search feature allows entry of a set of proteins and a set of RNAs, either in multiple lines or separated by commas, and allows any identifier types which the Search function can resolve, including ambiguous queries (e.g. for ‘lysine demethylase’). The only limitation is the total number of pairs queried, which is currently limited to 10 000 (allowing entry of e.g. 100 proteins and 100 RNAs).

Browse proteins or RNAs

These views list all proteins or RNAs contained in RNAct, i.e. the human, mouse and yeast reference proteomes and transcriptomes. In the Browse Proteins view, proteins are listed in order of availability of experimental interaction data (e.g. from eCLIP), evidence of RNA-binding activity (known or predicted RBPs), species and gene symbol. This allows the easy retrieval of known RBPs, particularly those with experimental interaction data. In the Browse RNAs view, transcripts are sorted by species, gene symbol, GENCODE ‘basic’ status, APPRIS classification, TSL and descending transcript length. This means that the best-supported transcript for a given gene will appear first.

Download

All RNAct protein–RNA interaction prediction data for human, mouse and yeast are available from the Download page. For human, the predictions are split into two sets for performance reasons: GENCODE ‘basic’ transcripts (covering a representative subset of 98 608 RNAs), and ‘non-basic’ transcripts making up the rest of the transcriptome. Both files can be concatenated for a full view of the human protein–RNA interactome, covering 20 778 proteins and 199 330 RNA transcripts. For mouse, only the GENCODE ‘basic’ transcripts are currently available, while the full annotated transcriptome is available for yeast. The RNAct predictions are licenced under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Licence (CC BY-NC-SA 4.0). A complete set of supporting tables containing protein and RNA annotations, identifier mappings used internally for searching, and the experimental data used (e.g. eCLIP) is available on the Download page as well. We intend to complete and add predictions for additional species such as C. elegans and D. melanogaster.

About

The About page gives more details on the algorithm and datasets used, provides literature references and answers what we expect to be frequently asked questions, including contact details.

DISCUSSION

RNAct provides an easy-to-use view of protein–RNA interactions in model organisms. It is intended to grow, both in terms of the number of species covered (currently human, mouse and yeast) and in terms of the experimental datasets provided. We hope our database will be particularly useful for studying gene regulatory events and networks at the post-transcriptional level (30). In addition to protein-centric datasets, recently published interactomes for the MALAT1, NEAT1 and NORAD long non-coding RNAs (lncRNAs) from a mass spectrometry-based method make it likely that additional RNA-centric datasets will be published in the near future (31). We are actively implementing features such as flagging interactions which are experimentally validated at low throughput, and allowing users to add articles supporting a given interaction. Interactions supported by the presence of an RNA-binding domain and its corresponding motifs are also intended to be highlighted in future (32). Additionally, we are considering to report the predicted binding regions for each interaction from catRAPID, similar to a CLIP binding profile, although this would require us to upgrade our server infrastructure due to the terabytes of data involved for all pairwise interactions. In summary, RNAct provides easy access to genome-scale protein–RNA interaction predictions with useful supporting annotation and experimental interaction evidence.

32 in total

1. Predicting protein associations with long noncoding RNAs.

Authors: Matteo Bellucci; Federico Agostini; Marianela Masin; Gian Gaetano Tartaglia
Journal: Nat Methods Date: 2011-06 Impact factor: 28.547

2. Interplay between posttranscriptional and posttranslational interactions of RNA-binding proteins.

Authors: Nitish Mittal; Tanja Scherrer; André P Gerber; Sarath Chandra Janga
Journal: J Mol Biol Date: 2011-04-09 Impact factor: 5.469

Review 3. A census of human RNA-binding proteins.

Authors: Stefanie Gerstberger; Markus Hafner; Thomas Tuschl
Journal: Nat Rev Genet Date: 2014-11-04 Impact factor: 53.242

Review 4. RNA-binding proteins in Mendelian disease.

Authors: Alfredo Castello; Bernd Fischer; Matthias W Hentze; Thomas Preiss
Journal: Trends Genet Date: 2013-02-15 Impact factor: 11.639

5. Neurodegenerative diseases: quantitative predictions of protein-RNA interactions.

Authors: Davide Cirillo; Federico Agostini; Petr Klus; Domenica Marchese; Silvia Rodriguez; Benedetta Bolognesi; Gian Gaetano Tartaglia
Journal: RNA Date: 2012-12-21 Impact factor: 4.942

6. GENCODE: the reference human genome annotation for The ENCODE Project.

Authors: Jennifer Harrow; Adam Frankish; Jose M Gonzalez; Electra Tapanari; Mark Diekhans; Felix Kokocinski; Bronwen L Aken; Daniel Barrell; Amonida Zadissa; Stephen Searle; If Barnes; Alexandra Bignell; Veronika Boychenko; Toby Hunt; Mike Kay; Gaurab Mukherjee; Jeena Rajan; Gloria Despacio-Reyes; Gary Saunders; Charles Steward; Rachel Harte; Michael Lin; Cédric Howald; Andrea Tanzer; Thomas Derrien; Jacqueline Chrast; Nathalie Walters; Suganthi Balasubramanian; Baikang Pei; Michael Tress; Jose Manuel Rodriguez; Iakes Ezkurdia; Jeltje van Baren; Michael Brent; David Haussler; Manolis Kellis; Alfonso Valencia; Alexandre Reymond; Mark Gerstein; Roderic Guigó; Tim J Hubbard
Journal: Genome Res Date: 2012-09 Impact factor: 9.043

7. ViennaRNA Package 2.0.

Authors: Ronny Lorenz; Stephan H Bernhart; Christian Höner Zu Siederdissen; Hakim Tafer; Christoph Flamm; Peter F Stadler; Ivo L Hofacker
Journal: Algorithms Mol Biol Date: 2011-11-24 Impact factor: 1.405

8. An integrated encyclopedia of DNA elements in the human genome.

Authors:
Journal: Nature Date: 2012-09-06 Impact factor: 49.962

9. X-inactivation: quantitative predictions of protein interactions in the Xist network.

Authors: Federico Agostini; Davide Cirillo; Benedetta Bolognesi; Gian Gaetano Tartaglia
Journal: Nucleic Acids Res Date: 2012-10-22 Impact factor: 16.971

10. Photo-cross-linking and high-resolution mass spectrometry for assignment of RNA-binding sites in RNA-binding proteins.

Authors: Katharina Kramer; Timo Sachsenberg; Benedikt M Beckmann; Saadia Qamar; Kum-Loong Boon; Matthias W Hentze; Oliver Kohlbacher; Henning Urlaub
Journal: Nat Methods Date: 2014-08-31 Impact factor: 28.547

18 in total

1. Structural analysis of SARS-CoV-2 genome and predictions of the human interactome.

Authors: Andrea Vandelli; Michele Monti; Edoardo Milanetti; Alexandros Armaos; Jakob Rupert; Elsa Zacco; Elias Bechara; Riccardo Delli Ponti; Gian Gaetano Tartaglia
Journal: Nucleic Acids Res Date: 2020-11-18 Impact factor: 16.971

2. RPpocket: An RNA-Protein Intuitive Database with RNA Pocket Topology Resources.

Authors: Rui Yang; Haoquan Liu; Liu Yang; Ting Zhou; Xinyao Li; Yunjie Zhao
Journal: Int J Mol Sci Date: 2022-06-21 Impact factor: 6.208

3. Demystifying the long noncoding RNA landscape of small EVs derived from human mesenchymal stromal cells.

Authors: Chien-Wei Lee; Yi-Fan Chen; Allen Wei-Ting Hsiao; Amanda Yu-Fan Wang; Oscar Yuan-Jie Shen; Belle Yu-Hsuan Wang; Lok Wai Cola Ho; Wei-Ting Lin; Chung Hang Jonathan Choi; Oscar Kuang-Sheng Lee
Journal: J Adv Res Date: 2021-11-19 Impact factor: 12.822

4. Probing TDP-43 condensation using an in silico designed aptamer.

Authors: Elsa Zacco; Owen Kantelberg; Edoardo Milanetti; Alexandros Armaos; Francesco Paolo Panei; Jenna Gregory; Kiani Jeacock; David J Clarke; Siddharthan Chandran; Giancarlo Ruocco; Stefano Gustincich; Mathew H Horrocks; Annalisa Pastore; Gian Gaetano Tartaglia
Journal: Nat Commun Date: 2022-06-23 Impact factor: 17.694

5. Knockdown of lncRNA HAGLR promotes Treg cell differentiation through increasing the RUNX3 level in dermatomyositis.

Authors: Wang Yan; Lulu Wang; Zhaoying Chen; Chengyao Gu; Caijing Chen; Xiaoxia Liu; Qin Ye
Journal: J Mol Histol Date: 2022-01-22 Impact factor: 2.611

6. HOTAIRM1 regulates neuronal differentiation by modulating NEUROGENIN 2 and the downstream neurogenic cascade.

Authors: Jessica Rea; Valentina Menci; Paolo Tollis; Tiziana Santini; Alexandros Armaos; Maria Giovanna Garone; Federica Iberite; Andrea Cipriano; Gian Gaetano Tartaglia; Alessandro Rosa; Monica Ballarino; Pietro Laneve; Elisa Caffarelli
Journal: Cell Death Dis Date: 2020-07-13 Impact factor: 8.469

7. RNA structure drives interaction with proteins.

Authors: Natalia Sanchez de Groot; Alexandros Armaos; Ricardo Graña-Montes; Marion Alriquet; Giulia Calloni; R Martin Vabulas; Gian Gaetano Tartaglia
Journal: Nat Commun Date: 2019-07-19 Impact factor: 14.919

Review 8. Zooming in on protein-RNA interactions: a multi-level workflow to identify interaction partners.

Authors: Alessio Colantoni; Jakob Rupert; Andrea Vandelli; Gian Gaetano Tartaglia; Elsa Zacco
Journal: Biochem Soc Trans Date: 2020-08-28 Impact factor: 5.407

9. Design and Functional Validation of a Mutant Variant of the LncRNA HOTAIR to Counteract Snail Function in Epithelial-to-Mesenchymal Transition.

Authors: Cecilia Battistelli; Sabrina Garbo; Veronica Riccioni; Claudia Montaldo; Laura Santangelo; Andrea Vandelli; Raffaele Strippoli; Gian Gaetano Tartaglia; Marco Tripodi; Carla Cicchini
Journal: Cancer Res Date: 2020-11-06 Impact factor: 13.312

10. RNAInter in 2020: RNA interactome repository with increased coverage and annotation.

Authors: Yunqing Lin; Tianyuan Liu; Tianyu Cui; Zhao Wang; Yuncong Zhang; Puwen Tan; Yan Huang; Jia Yu; Dong Wang
Journal: Nucleic Acids Res Date: 2020-01-08 Impact factor: 16.971