Literature DB >> 23203879

BSRD: a repository for bacterial small regulatory RNA.

Lei Li1, Dandan Huang, Man Kit Cheung, Wenyan Nong, Qianli Huang, Hoi Shan Kwan.   

Abstract

In bacteria, small regulatory non-coding RNAs (sRNAs) are the most abundant class of post-transcriptional regulators. They are involved in diverse processes including quorum sensing, stress response, virulence and carbon metabolism. Recent developments in high-throughput techniques, such as genomic tiling arrays and RNA-Seq, have allowed efficient detection and characterization of bacterial sRNAs. However, a comprehensive repository to host sRNAs and their annotations is not available. Existing databases suffer from a limited number of bacterial species or sRNAs included. In addition, these databases do not have tools to integrate or analyse high-throughput sequencing data. Here, we have developed BSRD (http://kwanlab.bio.cuhk.edu.hk/BSRD), a comprehensive bacterial sRNAs database, as a repository for published bacterial sRNA sequences with annotations and expression profiles. BSRD contains over nine times more experimentally validated sRNAs than any other available databases. BSRD also provides combinatorial regulatory networks of transcription factors and sRNAs with their common targets. We have built and implemented in BSRD a novel RNA-Seq analysis platform, sRNADeep, to characterize sRNAs in large-scale transcriptome sequencing projects. We will update BSRD regularly.

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 23203879      PMCID: PMC3531160          DOI: 10.1093/nar/gks1264

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Small regulatory RNAs (sRNAs) in bacteria are a class of non-coding RNA genes. They are usually 50–500 bp long and encoded in an estimated amount of ∼200–300 copies in a typical bacterial genome (1). sRNAs can be categorized as cis-encoded antisense sRNAs, which are completely complementary to their targets, and trans-encoded antisense sRNAs, which are only partially complementary to their targets, with binding facilitated by the RNA-binding protein Hfq (2). sRNAs are important post-transcriptional regulators, they can either inhibit translation of mRNAs by degrading the mRNAs or masking the ribosome binding sites, or activate the translation by opening the ribosome binding sites or increasing mRNA stability (3). They are involved in many crucial cellular processes, including biofilm formation (4) and quorum sensing (5). sRNAs may directly or indirectly regulate most bacterial genes (6). Since the first discovery of the chromosome-encoded sRNA regulator, MicF, in Escherichia coli (7), detection of sRNAs has been hampered by traditional genetic screening methods because of their relatively small size, non–protein-coding nature and locating in intergenic regions (8). As a result, only a limited amount of sRNAs has been identified. Recent advance in computational methods and high-throughput techniques such as genomic tiling microarrays and deep sequencing has discovered many sRNAs and provided invaluable insights into the detection and characterization of bacterial sRNAs. Although sRNAs play crucial regulatory roles and their discovery has been greatly facilitated in recent years, the quantity and quality of currently available sRNA databases are far from desirable. Databases such as RegulonDB (9) and Ecocyc (10) include only sRNAs in the E. coli K12 strain. Rfam (11), which is a database of structural RNA families, contains only a few bacterial sRNA families. sRNAMap (12) contains only data from gram-negative strains and lacks current updates. Other databases also focused on information on sRNA targets, however, whereas sRNATarbase (13) contains exclusively experimentally validated sRNA targets; RNApredator (14) and sRNATarget (15) are solely based on computational predictions. Furthermore, annotations of sRNAs in most of these databases are not up-to-date, and these databases can neither integrate nor analyse next-generation sequencing data. Therefore, a repository that collects sequences of all published sRNAs and with information such as their annotations and expression profiles is needed. Here, we present BSRD, a comprehensive bacterial sRNAs database, to serve as a repository for bacterial sRNA sequences and their annotations and expression profiles. In addition to sRNAs annotated in the public databases, we also include in BSRD manually curated bacterial sRNAs and their annotations from the literature. Besides identification of sRNAs, BSRD also provides extensive information on functional characterizations and expression profiles of sRNAs. Furthermore, we have developed and implemented in BSRD a new RNA-Seq analysis platform, sRNADeep, for characterization of sRNAs from high-throughput deep sequencing data.

DATA COLLECTION AND CURATION

Acquisition of sRNA sequences

BSRD contains three kinds of sRNA sequences grouped according to their discovery methods: (i) by experimental validation, (ii) by sequence and structural conservation, and (iii) by RNA-Seq or tiling microarray experiments. We have obtained 79 and 87 experimentally validated sRNAs from RegulonDB and sRNAMap, respectively. By literature mining, we first obtained a bacterial strain list from the NCBI taxonomy and searched the NCBI PubMed database using the keyword ‘sRNA and strain name’ with the PubCrawler program (16). sRNA information was then extracted manually from all of the resulting 445 relevant articles. These include sRNA name or alias, species, physical position, strand, identification method, growth phase, Gene Expression Omnibus accession, target genes and regulation effect, and regulators. sRNAs will be regarded as experimentally validated if they are identified by either Northern blot or reverse transcription polymerase chain reaction. Finally, 964 experimentally validated sRNAs were retrieved and added to BSRD. A total of 6266 sRNA homologs were collected from the Rfam database. In addition, 2334 bacteria genes annotated as ’ncRNA’ or ‘antisense RNA’ in the NCBI Gene database were also collected. An additional 310 sRNAs from sRNAMap were also retrieved. As a result, a total of 8248 non-redundant sRNA homologs were added to BSRD. We have also obtained and added to BSRD 507 candidate sRNAs identified from high-throughput sequencing datasets. These sRNAs display either differential expression in various conditions or a high expression in a single condition. However, as the current computational prediction method for novel sRNAs is of low precision (6–12%) and sensitivity (20–49%) (17), datasets solely predicted in silico were not included in BSRD. In addition, 20 115 bacteria regulatory elements were also integrated. For sRNAs found in multiple resources, exact duplicate hits were merged, but we kept others, which need to be further verified by rapid amplification of cDNA ends (RACE) or other experimental techniques. In BSRD, a new sRNA nomenclature system modified from Chen’s system (18) is used: a sRNA is indicated by an initial ‘s’, which stands for small RNA, followed by a three-letter genome ID used in the KEGG database, and a number that indicates its genomic location. We also add an ending number that indicates the number of sRNAs identified in this location.

Functional annotations of identified sRNAs

In BSRD, each sRNA entry contains seven sections of descriptions: Basic Info, UCSC Browser, Secondary Structure, Expression Profile, Target Info, Wikipedia and Other Links (Figure 1). The ‘Basic Info’ section provides sRNA sequence information and information such as identification method, terminators, Hfq binding and growth phase. Positions of sRNAs could be graphically visualized with the popular UCSC Archaeal Genome Browser (19) implemented in BSRD. Secondary structures of sRNAs are visualized by the RNAfold (20) and Mfold (21) programs. The ‘Expression Profile’ section provides expression evidence of sRNAs in different experimental conditions collected from the NCBI Gene Expression Omnibus (22) database. sRNA pathogenesis profiling data obtained by the recently emerging Tn-Seq approach (23) is also included.
Figure 1.

Overview of BSRD design. The three main characteristics of BSRD are (i) comprehensive data collection from external databases and the literatures, (ii) comprehensive annotation and expression profiles for sRNAs and (iii) a novel RNA-Seq analysis platform, sRNADeep, for characterizing sRNAs from high-throughput sequencing data.

Overview of BSRD design. The three main characteristics of BSRD are (i) comprehensive data collection from external databases and the literatures, (ii) comprehensive annotation and expression profiles for sRNAs and (iii) a novel RNA-Seq analysis platform, sRNADeep, for characterizing sRNAs from high-throughput sequencing data. Identification of sRNA targets is an initial step to understand the regulatory function of sRNAs. We have acquired 138 sRNA-target interactions from sRNATarBase and manually curated 56 new sRNA-target interactions from the literatures. The sRNA-target interactions were then combined with transcription factor-target interactions to form the regulatory networks. Sigma factors, which act as upstream regulators to regulate sRNA transcription (24), were also added into the networks. Moreover, target genes of identified sRNAs predicted using IntaRNA (25) and RNAplex (26) were also provided. As the Wikipedia-based community annotation platform has been successful in Rfam and miRBase (27), the same platform is also implemented in BSRD. Wikipedia pages for all sRNA entries have been reviewed manually to avoid vandalism before implementing into BSRD. As most sRNAs still do not have annotation pages in Wikipedia, a link to a brief guide for creating and editing a new Wikipedia page is also provided. Finally, we have provided cross-links to a selected list of external databases, including Rfam, the Gene Ontology (28), Sequence Ontology (29), RegulonDB, EchoBASE (30), EcoGene (31) and EcoCyc, for access to additional information of the sRNAs.

BSRD INTERFACE AND FUNCTIONALITIES

There are nine sections in the BSRD main menu: Home, Search BSRD, Hierarchical taxonomy, Regulatory network, BLAST BSRD, Download, sRNADeep, Submission and Latest publications. From the ‘Home’ page, a summary of numbers of sRNAs archived in the latest version of the database is available in the ‘Current release’ section. BSRD hosts 9579 sRNA entries from 957 bacterial strains. Answers to the most frequently asked questions are provided in the ‘FAQ’ section, and a help documentation is also included in the ‘Help’ section. The ‘Latest Update’ section provides news about recent updates of the database. From the ‘Search BSRD’ page, users can search for sRNAs in BSRD with three options: by sRNA id or name, by sRNA class or by genomic position. Alternatively, users could examine sRNAs according to the host organism from the list of bacteria in the ‘Hierarchical taxonomy’ page. Additionally, users could go to the ‘BLAST BSRD’ page for direct input or upload of sRNA sequences to do quick search against BSRD using the BLAST program. Results will be sorted by alignment scores with single nucleotide variations highlighted (Supplementary Figure S1). The ‘Download’ page provides options for users to download sRNA sequences in FASTA format according to the sRNA id or name, the bacterial host or batch. The ‘Submit’ page allows users to submit new sRNAs or annotations to BSRD. The ‘Latest publications’ page enables users to access the latest articles related to ‘bacterial sRNAs’ in PubMed and Microsoft Academic Research Databases.

Regulatory network

Transcription factor and sRNA can bind to the same target, whereas the clearance rates, steady-state concentrations and response curves can determine the dynamics of these regulatory networks (32). Regulatory networks in BSRD are constructed using cytoscape web (33) (Figure 2). Different colours were assigned for different elements of the networks: sRNA (yellow), target gene (orange), sigma factor (red) and transcription factor (blue). Regulatory relationships were also differentiated with different line patterns: repression (T-shaped) and inducement (Arrow).
Figure 2.

Screenshots of the regulatory network page. (a) Overview of the regulatory network. (b) Each target gene in the network could be linked out to the UniProt database. (c) When clicking on relationship lines, the description of regulatory effects will be shown.

Screenshots of the regulatory network page. (a) Overview of the regulatory network. (b) Each target gene in the network could be linked out to the UniProt database. (c) When clicking on relationship lines, the description of regulatory effects will be shown.

sRNADeep

sRNADeep is a novel platform for sRNA expression profiling from RNA-Seq data (Supplementary Figure S2). It can not only annotate expressed sRNAs from a single set of transcriptome data, but also identify differentially expressed sRNAs from two different conditions sRNADeep accepts compressed clean reads archives, which will then be mapped against the non-redundant sRNA set in BSRD using Burrows-Wheeler Alignment (BWA) (34), with a maximum of one mismatch allowed. Clean reads means filtered raw reads after adapter removal and quality trimming. The expectation maximization-based SEQ-EM algorithm (35) is used to handle multi-mapped reads. For a single dataset, the number of reads for each sRNA will be calculated and normalization will be performed using the reads per kilo-base per million method (36). For analysis of two datasets, DESeq (37) will be used to identify differentially expressed sRNAs between the samples. On job submission, users should provide a valid email address for receiving a job ID, which sRNADeep assigns, for result retrieval. A typical output of sRNADeep includes the length distribution of clean reads, the distribution of mapped reads and expression levels of sRNAs or differentially expressed sRNAs (Supplementary Figure S3).

DISCUSSION

Compared with other currently available sRNA-related databases, BSRD is more advanced in three aspects. First, BSRD hosts the largest collection of sRNAs (Table 1). It encompasses 964 experimentally validated sRNAs, 8248 sRNA homologs and 507 candidate sRNAs from high-throughput datasets. sRNAMap, for instance, collects only 87 validated sRNAs and 310 sRNA homologs.
Table 1.

Comparison of BSRD with other available resources

RegulonDBsRNAMapWikipediaRfamBSRD
No. of experimentally validated sRNAs798799964
No. of sRNA homologs31062668248
No. of sRNA-target interactions2660194
No. of genomes170957
Growth phase373
Secondary structure of sRNAsYesYesYes
Expression profiles supportedYesYes
Fitness of sRNAsYes
Sequence homology searchYesYesYes
Wikipedia-derived community annotationYesYesYes
Deep sequencing read analysis supportedYes
Computational interactions of sRNAs and targetsYes
Regulatory networkYesYes
Candidate sRNAs from high-throughput transcriptome studies-Yes
Comparison of BSRD with other available resources Second, BSRD not only provides extensive functional descriptions for sRNAs, but also includes multiple new sRNA annotations from manually curated literature mining, including growth phase, Hfq binding and Rho-independent terminators. It also gives access to large-scale target search prediction of identified sRNAs. We have also integrated information of upstream regulon sigma factors to sRNA regulatory networks for a more comprehensive visualization of regulatory functions. Third, although recent developments of deep sequencing technology have advanced sRNA researches, web-based tools for annotating sRNAs from high-throughput sequencing data are unavailable. We have thus developed sRNADeep to meet this need. We evaluated the performance of sRNADeep with the transcriptome data of Listeria monocytogenes (38). All 13 differentially expressed sRNAs previously reported were successfully recovered by sRNADeep. In addition, nine previously uncharacterized sRNAs were also identified by sRNADeep. sRNADeep could be a useful tool for characterizing sRNAs from deep sequencing data.

FUTURE DEVELOPMENTS

We will continue to import information concerning new bacterial genomes and update sRNA annotations in BSRD. We also welcome submissions of novel sRNAs or annotations. Because of the expanded use of high-throughput deep sequencing, we expect to develop functions such as the evaluation of the effects of sRNA binding from transcriptome data, and prediction of novel sRNAs by an improved version of sRNADeep.

AVAILABILITY

BSRD is freely available at http://kwanlab.bio.cuhk.edu.hk/BSRD. All sRNA sequences are also available for download in FASTA format. There are no access restrictions for academic and commercial use. The content of BSRD is freely available under the ODC Open Database License.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online: Supplementary Table 1 and Supplementary Figures 1–3.
  37 in total

1.  EcoGene: a genome sequence database for Escherichia coli K-12.

Authors:  K E Rudd
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  sRNATarBase: a comprehensive database of bacterial sRNA targets verified by experiments.

Authors:  Yuan Cao; Jiayao Wu; Qian Liu; Yalin Zhao; Xiaomin Ying; Lei Cha; Ligui Wang; Wuju Li
Journal:  RNA       Date:  2010-09-15       Impact factor: 4.942

3.  RNAplex: a fast tool for RNA-RNA interaction search.

Authors:  Hakim Tafer; Ivo L Hofacker
Journal:  Bioinformatics       Date:  2008-04-23       Impact factor: 6.937

4.  A unique mechanism regulating gene expression: translational inhibition by a complementary RNA transcript (micRNA).

Authors:  T Mizuno; M Y Chou; M Inouye
Journal:  Proc Natl Acad Sci U S A       Date:  1984-04       Impact factor: 11.205

5.  The small RNA chaperone Hfq and multiple small RNAs control quorum sensing in Vibrio harveyi and Vibrio cholerae.

Authors:  Derrick H Lenz; Kenny C Mok; Brendan N Lilley; Rahul V Kulkarni; Ned S Wingreen; Bonnie L Bassler
Journal:  Cell       Date:  2004-07-09       Impact factor: 41.582

6.  A survey of small RNA-encoding genes in Escherichia coli.

Authors:  Ruth Hershberg; Shoshy Altuvia; Hanah Margalit
Journal:  Nucleic Acids Res       Date:  2003-04-01       Impact factor: 16.971

7.  Direct comparison of small RNA and transcription factor signaling.

Authors:  Razika Hussein; Han N Lim
Journal:  Nucleic Acids Res       Date:  2012-05-22       Impact factor: 16.971

8.  Small RNAs in the genus Clostridium.

Authors:  Yili Chen; Dinesh C Indurthi; Shawn W Jones; Eleftherios T Papoutsakis
Journal:  mBio       Date:  2011-01-25       Impact factor: 7.867

9.  The UCSC Archaeal Genome Browser: 2012 update.

Authors:  Patricia P Chan; Andrew D Holmes; Andrew M Smith; Danny Tran; Todd M Lowe
Journal:  Nucleic Acids Res       Date:  2011-11-12       Impact factor: 16.971

10.  Control of virulence by small RNAs in Streptococcus pneumoniae.

Authors:  Beth Mann; Tim van Opijnen; Jianmin Wang; Caroline Obert; Yong-Dong Wang; Robert Carter; Daniel J McGoldrick; Granger Ridout; Andrew Camilli; Elaine I Tuomanen; Jason W Rosch
Journal:  PLoS Pathog       Date:  2012-07-12       Impact factor: 6.823

View more
  52 in total

1.  Genome-wide analyses in bacteria show small-RNA enrichment for long and conserved intergenic regions.

Authors:  Chen-Hsun Tsai; Rick Liao; Brendan Chou; Michael Palumbo; Lydia M Contreras
Journal:  J Bacteriol       Date:  2014-10-13       Impact factor: 3.490

Review 2.  Regulatory RNAs: charming gene management styles for synthetic biology applications.

Authors:  Jorge Vazquez-Anderson; Lydia M Contreras
Journal:  RNA Biol       Date:  2013-11-18       Impact factor: 4.652

3.  A comparative study of sequence- and structure-based features of small RNAs and other RNAs of bacteria.

Authors:  Amita Barik; Santasabuj Das
Journal:  RNA Biol       Date:  2017-11-13       Impact factor: 4.652

4.  A novel mechanism of ribonuclease regulation: GcvB and Hfq stabilize the mRNA that encodes RNase BN/Z during exponential phase.

Authors:  Hua Chen; Angelica Previero; Murray P Deutscher
Journal:  J Biol Chem       Date:  2019-11-19       Impact factor: 5.157

5.  Novel small RNA (sRNA) landscape of the starvation-stress response transcriptome of Salmonella enterica serovar typhimurium.

Authors:  Shivam V Amin; Justin T Roberts; Dillon G Patterson; Alexander B Coley; Jonathan A Allred; Jason M Denner; Justin P Johnson; Genevieve E Mullen; Trenton K O'Neal; Jason T Smith; Sara E Cardin; Hank T Carr; Stacie L Carr; Holly E Cowart; David H DaCosta; Brendon R Herring; Valeria M King; Caroline J Polska; Erin E Ward; Alice A Wise; Kathleen N McAllister; David Chevalier; Michael P Spector; Glen M Borchert
Journal:  RNA Biol       Date:  2016-02-06       Impact factor: 4.652

6.  The Streptococcus mutans irvA gene encodes a trans-acting riboregulatory mRNA.

Authors:  Nan Liu; Guoqing Niu; Zhoujie Xie; Zhiyun Chen; Andreas Itzek; Jens Kreth; Allison Gillaspy; Lin Zeng; Robert Burne; Fengxia Qi; Justin Merritt
Journal:  Mol Cell       Date:  2015-01-08       Impact factor: 17.970

7.  Inactivation of RNase P in Escherichia coli significantly changes post-transcriptional RNA metabolism.

Authors:  Bijoy K Mohanty; Sidney R Kushner
Journal:  Mol Microbiol       Date:  2021-09-25       Impact factor: 3.501

8.  Influence of dimethylsulfoxide on RNA structure and ligand binding.

Authors:  Janghyun Lee; Catherine E Vogt; Mitchell McBrairty; Hashim M Al-Hashimi
Journal:  Anal Chem       Date:  2013-09-25       Impact factor: 6.986

9.  Genes and novel sRNAs involved in PAHs degradation in marine bacteria Rhodococcus sp. P14 revealed by the genome and transcriptome analysis.

Authors:  Tao Peng; Jie Kan; Jing Hu; Zhong Hu
Journal:  3 Biotech       Date:  2020-02-26       Impact factor: 2.406

10.  ANNOgesic: a Swiss army knife for the RNA-seq based annotation of bacterial/archaeal genomes.

Authors:  Sung-Huan Yu; Jörg Vogel; Konrad U Förstner
Journal:  Gigascience       Date:  2018-09-01       Impact factor: 6.524

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.