Literature DB >> 31598696

snoDB: an interactive database of human snoRNA sequences, abundance and interactions.

Philia Bouchard-Bourelle1, Clément Desjardins-Henri1, Darren Mathurin-St-Pierre1, Gabrielle Deschamps-Francoeur1, Étienne Fafard-Couture1, Jean-Michel Garant1, Sherif Abou Elela2, Michelle S Scott1.   

Abstract

Small nucleolar RNAs (snoRNAs) are an abundant type of non-coding RNA with conserved functions in all known eukaryotes. Classified into two main families, the box C/D and H/ACA snoRNAs, they enact their most well characterized role of guiding site specific modifications in ribosomal RNA, through the formation of specific ribonucleoprotein complexes, with fundamental implications in ribosome biogenesis. However, it is becoming increasingly clear that the landscape of snoRNA cellular functionality is much broader than it once seemed with novel members, non-uniform expression patterns, new and diverse targets as well as several emerging non-canonical functions ranging from the modulation of alternative splicing to the regulation of chromatin architecture. In order to facilitate the further characterization of human snoRNAs in a holistic manner, we introduce an online interactive database tool: snoDB. Its purpose is to consolidate information on human snoRNAs from different sources such as sequence databases, target information, both canonical and non-canonical from the literature and from high-throughput RNA-RNA interaction datasets, as well as high-throughput sequencing data that can be visualized interactively.
© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Substances:

Year:  2020        PMID: 31598696      PMCID: PMC6943035          DOI: 10.1093/nar/gkz884

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Small nucleolar RNAs (snoRNAs) are a conserved class of non-coding RNAs found in all eukaryotes and most extensively characterized as guiding site specific post-transcriptional modifications in ribosomal RNA (rRNA) (1,2). In addition, a small number of additional snoRNAs such as SNORD3 and SNORD118 are known to play a role in the processing and maturation of rRNA. Two types of snoRNAs have been described: box C/D and box H/ACA snoRNAs, the majority of which are encoded in introns of host genes in human (1,3). Box C/D and box H/ACA snoRNAs respectively guide the 2′-O-methylation and the pseudouridylation of their targets by direct base pairing. To do so, they require the interaction of core binding proteins, which provide stability and the catalytic activity, forming complexes known as snoRNPs (snoRNA ribonucleoprotein complexes) (4). In human, 110 rRNA residues are known to be methylated by snoRNPs and 100 are pseudouridylated (5) although recent high-throughput sequencing and systematic comparative genomics efforts have identified additional likely candidates as well as positions that are fractionally modified (6–9). While canonical features, functionality and targets of snoRNAs are well-characterized, over the past decade, an increasingly large literature has exposed novel and unexpected aspects of snoRNA biology. High-throughput sequencing approaches give indications that snoRNAs can modify and/or otherwise interact with diverse RNAs including other snoRNAs, transfer RNAs and messenger RNAs (mRNAs) (6,10–13). As reviewed in (14), recent years have seen many potential novel functions being reported for snoRNAs including the modulation of alternative splicing (15–17), an essential involvement in stress response pathways (18–20), the regulation of pre-mRNA stability (21) and the modulation of mRNA 3′ end processing (22). Moreover, high-throughput sequencing approaches and computational pipelines addressing the unique challenges of snoRNAs have been elaborated, resulting in more accurate quantification of snoRNAs, and simultaneously of their host genes, indicating that the levels of expression of snoRNAs cover a wide range and do not always mirror those of their host gene (23–25). The improved quantification and increased characterization of snoRNAs has led to increasing numbers of snoRNAs and their host genes found to be involved in diseases. Examples of pathologies in which snoRNAs and their host genes play an important role and could be prime therapeutic targets include the Prader-Willi syndrome and diverse cancers (26–30). However, in many cases, while the involvement of snoRNAs in disease is now known, the molecular mechanism is unclear. Such is the case for SNORD118, mutations of which affect the expression, processing and protein binding of the snoRNA. But while SNORD118, like most snoRNAs, is ubiquitously expressed, germline mutations cause specific neurological phenotypes (31). The wealth of knowledge and data describing snoRNA biology requires careful management and integration to facilitate easy access and assimilation by the community. Unfortunately, much of the information regarding snoRNAs is disorganized, disseminated through disparate online platforms and peppered in the literature. For example, many RNA–RNA interactions have been detected for SNORD118 (11–13) and could be important to characterize the molecular mechanism of its involvement in disease (Supplementary Figure S1), but mining them from high-throughput datasets from the literature is not straightforward. A central snoRNA resource would considerably facilitate the characterization of snoRNA functionality and involvement in disease. Three dedicated snoRNA resources are currently available for human: snoRNAbase (5), snOPY (32) and snoRNA Atlas (33). However, these resources have either not been kept up to date with the new snoRNA genes annotated, new interactors and functionalities, or have a different scope (e.g. snOPY is a database of snoRNA orthology). With so many key regulatory features emerging as intrinsic snoRNA functions, there is a pressing need to unify the scattered data currently available on human snoRNAs in order to optimize future research endeavors. The online interactive snoRNA database we propose, snoDB, aims to do that and more. Indeed, integrating available data is of great importance but snoDB further aspires to consolidate the above information with curated peer-reviewed high-throughput data in an effort to lead and incite research in the further characterization of the human snoRNA landscape in health and disease.

DATABASE CONTENT

SnoDB is based on the human hg38 reference genome assembly. It aims to be inclusive and integrate gene annotations and a wide diversity of features from all relevant available databases (Table 1). SnoRNA gene annotations were obtained from RefSeq (34), Ensembl (35) and RNAcentral (36), which in turn provides annotations from snOPY (32) and Rfam (37). Careful manual curation was carried out to consolidate the annotations and to ensure no snoRNA entries share exact same genomic coordinates. When different names are employed for a given snoRNA gene, the RefSeq name was used by default, but if absent, the RNAcentral or the Ensembl names were used. In addition to the gene symbol, genomic coordinates and gene sequence, all additional names obtained from the HUGO Gene Nomenclature Committee (HGNC) (38) are available in the ‘synonym’ column, and all identifiers of all above databases are provided as links. SnoDB houses 2064 human snoRNAs, integrating the annotations of the above databases. In contrast, the other main snoRNA-centric resources, snoRNAbase (5), snOPY (32) and snoRNA Atlas (33), contain respectively 402, 760 and 1118 human snoRNAs (Table 1).
Table 1.

Features of human snoRNA databases

DatabasesnoRNA countLinks to external resourcesOrthology (O) and conservation (C)aHost gene characteristicsbrRNA and snRNA target dataNon-canonical target datacsnoRNA expression datadHost gene expression datadData available for download
snoRNAbase (5)402UCSC Genome Browser hg18 HGNC Genbank LiteratureO (to yeast)NCAL---
snOPY (32)760RefseqON----
snoRNA Atlas (33)1118RfamCN-E-
snoDB2064UCSC Genome Browser hg38 RefSeq HGNC Ensembl RNAcentral NCBI Rfam snoRNAbase snOPY snoRNA Atlas RISE database LiteratureOCdNBCALROPTLSOPTLS

aIn snoDB, links are provided to snOPY and Ensembl orthology pages when available and conservation data were obtained from snoRNA Atlas.

bHost gene characteristics: N: name; B: biotype; C: genomic coordinates; A: biological process annotation.

cNon-canonical target data are supported by articles in the literature (L) and by links to the RISE database (R).

dFor snoRNA Atlas: E indicates amalgamated expression values from ENCODE. For snoDB: all expression values were obtained using the low structure bias TGIRT-seq methodology. O: normal human ovary; P: normal human prostate; T: normal human testis; L: normal human liver; S: SKOV3ip1 human ovarian carcinoma cell line.

Features of human snoRNA databases aIn snoDB, links are provided to snOPY and Ensembl orthology pages when available and conservation data were obtained from snoRNA Atlas. bHost gene characteristics: N: name; B: biotype; C: genomic coordinates; A: biological process annotation. cNon-canonical target data are supported by articles in the literature (L) and by links to the RISE database (R). dFor snoRNA Atlas: E indicates amalgamated expression values from ENCODE. For snoDB: all expression values were obtained using the low structure bias TGIRT-seq methodology. O: normal human ovary; P: normal human prostate; T: normal human testis; L: normal human liver; S: SKOV3ip1 human ovarian carcinoma cell line. The snoRNA features that are available for display in snoDB also include host gene characteristics with a link to the Ensembl entry, the biotype, synonyms if relevant and genomic coordinates. In addition, snoDB features conservation data from snoRNA Atlas (33), orthology data from snOPY (32), snoRNA target data with enrichment details in select tissues from the human protein atlas (39) when available and expression data (Tables 1 and 2). Target data include known targets in rRNA annotated in snoRNAbase (5) and rRNA targets confirmed by RiboMethSeq (8). Non-canonical interactors that were experimentally validated in the literature are also included and links to the articles are available. These studies include (11,15–17,21), as described in the Introduction. Finally, RNA–RNA interaction data were incorporated from the RISE:RNA Interactome, a database compiling results from multiple high-throughput RNA–RNA interaction studies (40) with the name and biotype of all RISE interactors being available. Levels of abundance of both snoRNAs and their host gene measured in various human tissues and cell lines using a low structure bias RNA-seq approach are also available as obtained from (23) and GEO entries from GSE126797. The snoDB back-end is built in PostgreSQL (9.5.1) as a relational database which is integrated into the Django web framework (1.6.5).
Table 2.

Characteristics of snoRNAs in snoDB

Box C/DBox H/ACAOtherTotal
All snoRNAsa1391651222064
Distinct snoRNA symbolsb46124621728
Intronic snoRNAs encoded in host genes4233183744
Intergenic snoRNAs968333191320
snoRNA-target pairs1471616312118
• snoRNA-rRNA target pairs4812552738
• snoRNA-snRNA target pairs113647184
• snoRNA-non-canonical target pairsc877297221196
snoRNAs with transcriptomic data5244693996

aAll snoRNAs include snoRNAs with the same name and/or sequence but encoded in different genomic loci.

bCounts every snoRNA symbol only once. Some snoRNAs bear the same symbol but have different IDs based on differences in their sequence and in the loci in which they are encoded or the length of their sequence.

cNon-canonical targets of snoRNAs include mRNAs and genomic regions not known to encode annotated genes.

Characteristics of snoRNAs in snoDB aAll snoRNAs include snoRNAs with the same name and/or sequence but encoded in different genomic loci. bCounts every snoRNA symbol only once. Some snoRNAs bear the same symbol but have different IDs based on differences in their sequence and in the loci in which they are encoded or the length of their sequence. cNon-canonical targets of snoRNAs include mRNAs and genomic regions not known to encode annotated genes.

WEB INTERFACE

The main page of snoDB is divided into four sections: (i) As shown in Figure 1A, the top of the page displays snoDB’s logo adjacent to a search engine for snoRNA names. Immediately below the logo, a switch allows to toggle snoDB’s sister tool snoTHAW (snoDB Table Heatmap Arrangement Widget), which enables the interactive visualization of abundance values of snoRNAs and their host gene. To the right of the logo can be found links to additional information pages on the database in the ‘About’, ‘Tutorial’, ‘Statistics’ and ‘Experiment details’ sections, as well as a link for downloading of the whole database. (ii) The section directly below (shown in Figure 1B) features a menu bar with options related to the table. Clicking on ‘Column Options’ reveals a set of buttons with 3 kinds of functionalities: toggling the visibility of single columns using the column visibility button, toggling the visibility of column groups using the color-coded buttons, and downloading data in either TSV, BED or XLSX file formats based on currently visible or selected rows in the table. The ‘Advanced Search’ option reveals 5 search boxes (shown in Figure 1B) that are specific to certain groups of columns as noted by their placeholder text and outline colors. The ‘Reset Filters’ option erases all filtering currently active on the table, whether it is from the topmost main search, the advanced search bars or the column specific search boxes in the table itself. This option, along with the ‘Refresh Table’ option that follows it, exist because the state of all search inputs, column visibilities and row selections are saved upon refreshing the page. Hence, ‘Reset Filters’ facilitates the clearing of all search fields without needing to refresh the page while ‘Reset Tables’ reloads the page back to its default state. (iii) Below the options menu, the main table dynamically displays the snoDB data (Figure 1C). (iv) The bottom of the page reveals snoTHAW when the switch at the top of the snoDB page is toggled. SnoTHAW enables the visualization and interaction of RNA-seq expression data contained within snoDB (Figure 1D and Supplementary Figure S2). Currently, expression data are displayable for four healthy tissues (breast, liver, ovary and prostate) as well as the SKOV3ip1 ovarian cancer cell lines. In addition, box type, chromosome and conservation data also found in snoDB can be displayed on the heatmap's y-axis with the ability to re-order the columns and rows based on these features or based on the expression data to suit the user's needs. All available expression data in snoDB was generated using the TGIRT-seq approach which allows accurate quantification and comparison of all cellular RNAs including highly structured and modified RNAs such as snoRNA (23–25), as described above. As more such datasets become available, they will also be incorporated in snoDB.
Figure 1.

Screenshot of the main page of snoDB displaying the site's four sections. (A) The snoDB logo, basic search engine and links to information pages. (B) A menu bar with options to control the content and appearance of the table. (C) snoDB’s main table where data are displayed and can be interacted with. By default, all 2064 snoRNA entries are shown by scrolling down. (D) The snoTHAW interface with the heatmap visualization beneath.

Screenshot of the main page of snoDB displaying the site's four sections. (A) The snoDB logo, basic search engine and links to information pages. (B) A menu bar with options to control the content and appearance of the table. (C) snoDB’s main table where data are displayed and can be interacted with. By default, all 2064 snoRNA entries are shown by scrolling down. (D) The snoTHAW interface with the heatmap visualization beneath. The main page features three levels of querying capabilities. The first consists of a single search-box which lies to the left of the snoDB logo atop the page (Figure 1A). Clicking and/or typing into this area reveals a drop-down menu comprised of all snoRNA symbols which reside in the table's first column of the same name. Multiple symbols can be selected making this a quick and easy way to access information on a few snoRNAs of interest. The second consists of the five previously mentioned search boxes located above the table upon clicking on the ‘Advanced search’ option. From left to right, the first one searches through the snoRNA symbols and synonyms columns, the second through all the external ID columns, the third through host symbols and synonyms, the fourth through target columns and the fifth and final search box is a global search covering the entire snoDB dataset. The first four search boxes operate on an exact-match basis while the global search supports partial search terms. All five search boxes support regular expressions as well as multiple space-separated terms making copy-pasting columns from a spreadsheet into an appropriate search engine an easy way to view numerous specific snoRNA entries. The third searching strategy is found within the table itself and provides individual column searching capabilities on select columns and it also supports multiple inputs. In addition to the interactive viewing and querying of columns, snoDB’s main table contains the following features: a frozen first column for seamless horizontal scrolling through many columns, row selection upon click for visual highlights and as a means of input into snoTHAW, drag-and-drop column re-ordering, column sorting and an abundance of external links to corresponding snoRNA entries in other databases. All of these functionalities are described in the ‘About’ page as well as through interactive examples in the Tutorial (Supplementary Figure S3). While having all data selectively displayable in a single interactive table is a great convenience, it can also be impractical when one wishes to view all data for a single entry without needing to horizontally scroll back and forth. Therefore clicking on any snoRNA in the ‘Symbol’ column opens a new tab to a page displaying all available information on that entry in a vertical format (Supplementary Figure S4). These individual data hubs are divided into familiar sub-sections and feature external links to all previously mentioned sources along with additional links for interaction data, all of which can be searched through using the individual column search engines present.

CONCLUSION AND FUTURE PLANS

The snoDB interactive web application is a holistic relational database which consolidates diverse information regarding human snoRNAs from key sources, curated articles and datasets in an attempt to facilitate further research in the field of snoRNAs. Along with minor periodic updates, additional high-throughput datasets will be incorporated in snoDB as they become available.

DATA AVAILABILITY

http://scottgroup.med.usherbrooke.ca/snoDB/. Click here for additional data file.
  40 in total

1.  Matching of Soulmates: coevolution of snoRNAs and their targets.

Authors:  Stephanie Kehr; Sebastian Bartschat; Hakim Tafer; Peter F Stadler; Jana Hertel
Journal:  Mol Biol Evol       Date:  2013-10-24       Impact factor: 16.240

2.  Proteomics. Tissue-based map of the human proteome.

Authors:  Mathias Uhlén; Linn Fagerberg; Björn M Hallström; Cecilia Lindskog; Per Oksvold; Adil Mardinoglu; Åsa Sivertsson; Caroline Kampf; Evelina Sjöstedt; Anna Asplund; IngMarie Olsson; Karolina Edlund; Emma Lundberg; Sanjay Navani; Cristina Al-Khalili Szigyarto; Jacob Odeberg; Dijana Djureinovic; Jenny Ottosson Takanen; Sophia Hober; Tove Alm; Per-Henrik Edqvist; Holger Berling; Hanna Tegel; Jan Mulder; Johan Rockberg; Peter Nilsson; Jochen M Schwenk; Marica Hamsten; Kalle von Feilitzen; Mattias Forsberg; Lukas Persson; Fredric Johansson; Martin Zwahlen; Gunnar von Heijne; Jens Nielsen; Fredrik Pontén
Journal:  Science       Date:  2015-01-23       Impact factor: 47.728

3.  Small nucleolar RNAs U32a, U33, and U35a are critical mediators of metabolic stress.

Authors:  Carlos I Michel; Christopher L Holley; Benjamin S Scruggs; Rohini Sidhu; Rita T Brookheart; Laura L Listenberger; Mark A Behlke; Daniel S Ory; Jean E Schaffer
Journal:  Cell Metab       Date:  2011-07-06       Impact factor: 27.287

4.  In Vivo Mapping of Eukaryotic RNA Interactomes Reveals Principles of Higher-Order Organization and Regulation.

Authors:  Jong Ghut Ashley Aw; Yang Shen; Andreas Wilm; Miao Sun; Xin Ni Lim; Kum-Loong Boon; Sidika Tapsin; Yun-Shen Chan; Cheng-Peow Tan; Adelene Y L Sim; Tong Zhang; Teodorus Theo Susanto; Zhiyan Fu; Niranjan Nagarajan; Yue Wan
Journal:  Mol Cell       Date:  2016-05-12       Impact factor: 17.970

Review 5.  Box C/D small nucleolar RNA genes and the Prader-Willi syndrome: a complex interplay.

Authors:  Jérôme Cavaillé
Journal:  Wiley Interdiscip Rev RNA       Date:  2017-03-13       Impact factor: 9.957

6.  Human box C/D snoRNA processing conservation across multiple cell types.

Authors:  Michelle S Scott; Motoharu Ono; Kayo Yamada; Akinori Endo; Geoffrey J Barton; Angus I Lamond
Journal:  Nucleic Acids Res       Date:  2011-12-22       Impact factor: 16.971

Review 7.  Assembly and trafficking of box C/D and H/ACA snoRNPs.

Authors:  Séverine Massenet; Edouard Bertrand; Céline Verheggen
Journal:  RNA Biol       Date:  2016-10-07       Impact factor: 4.652

8.  High-throughput identification of C/D box snoRNA targets with CLIP and RiboMeth-seq.

Authors:  Rafal Gumienny; Dominik J Jedlinski; Alexander Schmidt; Foivos Gypas; Georges Martin; Arnau Vina-Vilaseca; Mihaela Zavolan
Journal:  Nucleic Acids Res       Date:  2017-03-17       Impact factor: 16.971

9.  Genenames.org: the HGNC and VGNC resources in 2017.

Authors:  Bethan Yates; Bryony Braschi; Kristian A Gray; Ruth L Seal; Susan Tweedie; Elspeth A Bruford
Journal:  Nucleic Acids Res       Date:  2016-10-30       Impact factor: 16.971

10.  Profiling of 2'-O-Me in human rRNA reveals a subset of fractionally modified positions and provides evidence for ribosome heterogeneity.

Authors:  Nicolai Krogh; Martin D Jansson; Sophia J Häfner; Disa Tehler; Ulf Birkedal; Mikkel Christensen-Dalsgaard; Anders H Lund; Henrik Nielsen
Journal:  Nucleic Acids Res       Date:  2016-06-01       Impact factor: 16.971

View more
  24 in total

1.  Diurnal rhythms across the human dorsal and ventral striatum.

Authors:  Kyle D Ketchesin; Wei Zong; Mariah A Hildebrand; Marianne L Seney; Kelly M Cahill; Madeline R Scott; Vaishnavi G Shankar; Jill R Glausier; David A Lewis; George C Tseng; Colleen A McClung
Journal:  Proc Natl Acad Sci U S A       Date:  2021-01-12       Impact factor: 11.205

2.  RNAnue: efficient data analysis for RNA-RNA interactomics.

Authors:  Richard A Schäfer; Björn Voß
Journal:  Nucleic Acids Res       Date:  2021-06-04       Impact factor: 16.971

Review 3.  Dysregulation of Small Nucleolar RNAs in B-Cell Malignancies.

Authors:  Martijn W C Verbeek; Stefan J Erkeland; Vincent H J van der Velden
Journal:  Biomedicines       Date:  2022-05-24

4.  Protein-RNA Interactome Analysis Reveals Wide Association of Kaposi's Sarcoma-Associated Herpesvirus ORF57 with Host Noncoding RNAs and Polysomes.

Authors:  Beatriz Alvarado-Hernandez; Yanping Ma; Nishi R Sharma; Vladimir Majerciak; Alexei Lobanov; Maggie Cam; Jun Zhu; Zhi-Ming Zheng
Journal:  J Virol       Date:  2021-11-17       Impact factor: 6.549

Review 5.  Death by lipids: The role of small nucleolar RNAs in metabolic stress.

Authors:  Jean E Schaffer
Journal:  J Biol Chem       Date:  2020-05-11       Impact factor: 5.157

6.  Annotation of snoRNA abundance across human tissues reveals complex snoRNA-host gene relationships.

Authors:  Étienne Fafard-Couture; Danny Bergeron; Sonia Couture; Sherif Abou-Elela; Michelle S Scott
Journal:  Genome Biol       Date:  2021-06-04       Impact factor: 13.583

7.  Discovery and validation of the prognostic value of the lncRNAs encoding snoRNAs in patients with clear cell renal cell carcinoma.

Authors:  Wuping Yang; Kenan Zhang; Lei Li; Kaifang Ma; Baoan Hong; Yanqing Gong; Kan Gong
Journal:  Aging (Albany NY)       Date:  2020-03-03       Impact factor: 5.682

8.  SnoRNA copy regulation affects family size, genomic location and family abundance levels.

Authors:  Danny Bergeron; Cédric Laforest; Stacey Carpentier; Annabelle Calvé; Étienne Fafard-Couture; Gabrielle Deschamps-Francoeur; Michelle S Scott
Journal:  BMC Genomics       Date:  2021-06-05       Impact factor: 3.969

Review 9.  Small nucleolar RNAs: continuing identification of novel members and increasing diversity of their molecular mechanisms of action.

Authors:  Danny Bergeron; Étienne Fafard-Couture; Michelle S Scott
Journal:  Biochem Soc Trans       Date:  2020-04-29       Impact factor: 5.407

10.  Separated Siamese Twins: Intronic Small Nucleolar RNAs and Matched Host Genes May be Altered in Conjunction or Separately in Multiple Cancer Types.

Authors:  Marianna Penzo; Rosanna Clima; Davide Trerè; Lorenzo Montanaro
Journal:  Cells       Date:  2020-02-07       Impact factor: 6.600

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.