Literature DB >> 18996894

siRecords: a database of mammalian RNAi experiments and efficacies.

Yongliang Ren1, Wuming Gong, Haiyan Zhou, Yejun Wang, Feifei Xiao, Tongbin Li.   

Abstract

RNAi-based gene-silencing techniques offer a fast and cost-effective way of knocking down genes' functions in an easily regulated manner. Exciting progress has been made in recent years in the application of these techniques in basic biomedical research and therapeutic development. However, it remains a difficult task to design effective siRNA experiments with high efficacy and specificity. We present siRecords, an extensive database of mammalian RNAi experiments with consistent efficacy ratings. This database serves two purposes. First, it provides a large and diverse dataset of siRNA experiments. This dataset faithfully represents the general, diverse RNAi experimental practice, and allows more reliable siRNA design tools to be developed with the overfitting problem well curbed. Second, the database helps experimental RNAi researchers directly by providing them with the efficacy and other information about the siRNAs experiments designed and conducted previously against the genes of their interest. The current release of siRecords contains the records of 17,192 RNAi experiments targeting 5086 genes.

Entities:  

Mesh:

Substances:

Year:  2008        PMID: 18996894      PMCID: PMC2686443          DOI: 10.1093/nar/gkn817

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

RNA interference (or RNAi) is a recently discovered, naturally occurring mechanism for sequence-specific, post-transcriptional down-regulation of gene expression (1). Because RNAi-based gene knockdown techniques (using siRNAs, or small interfering RNAs) offer a fast and cost-effective way of disrupting genes’ functions in an easily regulated manner, rapid progress has been made in recent years in the application of these techniques in basic biomedical research and clinical development. In the basic research domain, siRNAs have become a standard gene knockdown tool routinely used in molecular genetics and function genomics laboratories (2,3). In the clinical domain, several RNAi-based therapies against ocular diseases (e.g. AMD or age-related macular degeneration), virus infection (by Hepatitis B and C, and HIV), cancers (e.g. solid tumors) and inflammatory diseases have reached the clinical or pre-clinical trial stage in development (4–6), and a large number of other RNAi-based potential therapeutic agents are actively being explored (7,8). The successful employment of an RNAi-based gene knockdown technique depends on the proper design or selection of the siRNAs, and the adoption of an effective strategy to deliver the siRNAs to the target cells or tissues (4,9). The purpose of designing siRNAs is to choose from a large number of candidate siRNA sites the ones likely to achieve high potency/efficacy and good specificity (against off-target activity). A properly devised delivery system (using, e.g. viral or non-viral vectors, conjugates, cationic liposomes, or complexes with peptides, polymers, antibodies and aptamers) helps to improve the stability of the siRNA agent, and reduce or eliminate the innate immune response and/or other harmful side-effects induced by the siRNA agent (5,7,10). The issue of how to design siRNAs that produce high efficacy is the focus of a large body of recent research work [see recent reviews, e.g. (11–16)]. Since it was discovered that not all siRNAs are equally potent in their ability to silence the gene products (17), a series of studies have pointed to a large number of ‘features’ that might be correlated to the higher efficacy of RNAi experiments. These features can be roughly classified into three categories. The first category are sequence features, including direct sequence features which are defined based on the nucleotide identity in particular positions of the siRNA, e.g. the 6th nucleotide of the siRNA sequence is a ‘A’ (18,19), and sequence-derived features, e.g. the G/C content of the siRNA is between 30% and 52% (20), and there are no occurrences of more than three identical nucleotides in consecutive positions (21,22). The second category include features defined based on the thermodynamics of the siRNA, e.g. the binding energy in the n7-n11 region is between –1.97 and –1.65 kcal/mol (23), and features surrounding the concept of siRNA duplex terminal asymmetry, e.g. the difference in binding energy between the n16–n19 region and n1–n4 region is greater than 1 kcal/mol (24). The third category of features are defined based on the target sites on the mRNA, including target location-related features, e.g. the target site is outside of the third quartile of the coding region of the mRNA (25), and features focusing on the target site accessibility (26,27), e.g. the local free energy of the most stable structure is greater than or equal to –20.9 kcal/mol (28). Moreover, recent studies suggested that factors related to experimental settings, e.g. the types of siRNA constructs (29,30), the types of cells used (30–34) as well as the methods applied in examining gene products (35) might also influence the efficacy of the RNAi experiments. A number of siRNA design tools were established in which various combinations of these features were implemented [see recent reviews, e.g. (15,36)]. However, the controversy continues as for which of these features are truly helpful in selecting high-efficacy siRNAs. Meanwhile, it has been increasingly recognized that many earlier siRNA design studies suffered from the ‘overfitting’ problem (14,37,38)—a term commonly used in the machine learning field, referring to situations where, consequent upon excessive training of a classifier, the performance of the classifier becomes increasingly better on the training data, but worsens on testing data. The only practical way to overcome the overfitting problem is to make use of a large and diverse training dataset (which approximates the ultimate ‘testing data’—the general siRNA experimental practice as a whole) when investigating features or factors associated with the higher siRNA efficacy. We present siRecords (http://siRecords.umn.edu/siRecords), an extensive database of mammalian RNAi experiments with consistent efficacy ratings. Because siRecords hosts the records of all kinds of siRNA experiments conducted with various laboratory techniques and experimental settings, it is a faithful representation of the general, diverse siRNA experimental practice. Recently, using a dataset compiled from siRecords, we analyzed a large number of reported features for their ability to improve RNAi effectiveness. Through carefully combining the most significant features, we derived a bundle of siRNA design rule sets (called the DRM rule sets) which were subsequently shown to outperform a number of established siRNA design tools in selecting effective siRNAs (14). This work demonstrated the usefulness of the siRecords database. In this article, we outline the design considerations of the siRecords database, its structure and features, and describe the recent improvements made in the siRecords project.

DATABASE CONTENT

siRecords is designed to serve two different purposes: (i) it provides a large and diverse dataset of experimentally validated siRNAs with consistent efficacy ratings, and this dataset can be used by bioinformatics scientists in developing more reliable siRNA design tools, and (ii) it helps experimental RNAi researchers directly by providing the information about what siRNAs have been tested by other researchers against the genes of their interest, and what efficacy levels were achieved in those previous RNAi experiments. The literature curation and data recording procedures have remained unchanged over the past four years. First, queries are sent to the PubMed database (http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed) for publications related to ‘RNAi’ and ‘siRNA’. Then, the abstracts of the publications are screened, and the full text articles likely to contain information about RNAi gene silencing experiments are retrieved and further examined. Next, for each article containing descriptions of RNAi experiments, the siRNA sequences, the target genes and other key information about experimental conditions are recorded. This information includes: the cells or tissues in which the RNAi experiments were conducted, the forms of the siRNA agents—chemically synthesized oligos or vector transfected shRNAs, and the methods applied in testing the efficacy of the siRNAs—western blot, RT–PCR or others. The siRNA sequences are aligned with the mRNA sequences of the target genes using bl2seq (http://www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi), and the aligned sequences are recorded. Moreover, an efficacy rating is assigned to each RNAi experiment, based on the description about the result of the gene silencing experiment made in the article. The efficacy rating scheme was designed with balanced considerations. A very coarse-grained rating scheme (for example, a binary scheme that rates siRNAs with ‘effective’ and ‘ineffective’) would result in poor usefulness of the database because of the limited information it provides. On the other hand, a very fine-grained rating scheme (for example, one that classifies siRNAs into 10 efficacy categories) would lead to difficulty in obtaining accurate ratings, resulting in a less reliable database being produced. We balanced these two factors and chose to use a four-level rating scheme, where the efficacy of an RNAi experiment is rated as ‘very high’ if the gene product is reduced by more than 90%; it is rated as ‘high’ if the gene product is reduced by 70–90%; ‘medium’ if between 50% and 70% of gene knockdown is achieved; and ‘low’ if less than 50% of gene knockdown is obtained. The informative sentences in the original articles describing the siRNA efficacy are copied down and stored in the ‘original_assessment’ field in the database. When adequate textual descriptions about the siRNA efficacy are not available, best efforts are made to assign the efficacy rating scores based on the figures (gel images or summary bar-graphs) presented in the articles, and this information (the basis of the efficacy score assignment) is also kept in the ‘original_assessment’ field in the database. During the data deposition process, the siRNA sequence that is maintained in the database may undergo some transformations from the original publication into the database. First, it is possible that DNA bases from the published resource are deposited as RNA, one or more bases represented as ‘T’ may be transformed into ‘U’. Second, it is possible that the sense strand or passenger strand of the siRNA sequence is deposited rather than the guide strand. These are known issues that are being actively corrected, but the data are currently heterogeneous as to whether these transformations have occurred or have been corrected. Future releases of siRecords will contain estimates of the degree to which we believe the contents are clean or contain specific kinds of contaminating or transformed data. There are four major tables in the database schema: SiRecord, which stores the siRNA sequence, key experimental conditions (cell or tissue type, host species, method of making/delivery siRNAs, method of testing efficacy and the test object), original efficacy assessment (sentences related to efficacy assessment in the original articles), and the efficacy rating assigned by siRecords curator; Gene, which stores information about the genes targeted by the siRNAs, including Genbank accession, organism and description of the gene; Correspondent, which stores the contact information of the siRNA origin; and Publication, which stores key information, including PubMed ID and citation data of the original publication. The current release of siRecords hosts the records of 17 192 RNAi experiments targeting 5086 unique genes, curated from 6122 research articles. The size of the database has more than quadrupled when compared to the first release of the database (Figure 1).
Figure 1.

Statistics summary of the records of RNAi experiments in the current release of siRecords. (a) Species distribution. The category ‘monkeys’ includes multiple species, including Aotus trivirgatus, Cercopithecus aethiops and Macca mulatta. (b) Efficacy rating distribution. (c) Distribution of the siRNA lengths. (d) Distribution of the methods by which the siRNAs are produced and delivered.

Statistics summary of the records of RNAi experiments in the current release of siRecords. (a) Species distribution. The category ‘monkeys’ includes multiple species, including Aotus trivirgatus, Cercopithecus aethiops and Macca mulatta. (b) Efficacy rating distribution. (c) Distribution of the siRNA lengths. (d) Distribution of the methods by which the siRNAs are produced and delivered. The web interface of the database has recently been rewritten. The improved interface includes a ‘siRNA Input Wizard’ which will guide data contributors to submit their own records of RNAi experiments with ease. Moreover, the primitive siRNA design tool incorporated in the previous release of siRecords has been replaced by siDRM—a recently developed full-featured siRNA design program in which updated DRM rule sets are implemented (39).

UTILITY

siRecords can be accessed at http://siRecords.umn.edu/siRecords/. At the main page, the user could query a gene by entering the Genbank accession number or GI number, and the matching records would be presented to the user. After the user selects a record, the record display page will present with all relevant information about the record, including the siRNA sequence, experimental setting, efficacy rating and the source of the record. The links to all other records targeting the same gene, and all other records obtained from the same source is displayed. Data contributors could submit their own records of RNAi experiments with the help of the ‘siRNA Input Wizard’ shown in the left panel of the web site (registration is required).

DATA ACCESS

The siRecords web site is publically accessible through the URL http://siRecords.umn.edu/siRecords. Academic users can obtain a copy of the current release of the dataset by sending an email to siRecords@biocompute.umn.edu.

IMPLEMENTATION

The siRecords database is a relational database implemented with MySQL on a Fedora II Linux system running on an Intel DUO core 2 computer. The front-end web interface is implemented as a PHP project running under Apache 2.0.

FUNDING

University of Minnesota Graduate School and Minnesota Medical Foundation (partial); NIH/NCI (1R21CA126209, 4R33CA126209) (to T.L.). Funding for open access charge: NIH/NCI Conflict of interest statement. None declared.
  39 in total

1.  Improved and automated prediction of effective siRNA.

Authors:  Alistair M Chalk; Claes Wahlestedt; Erik L L Sonnhammer
Journal:  Biochem Biophys Res Commun       Date:  2004-06-18       Impact factor: 3.575

2.  An algorithm for selection of functional siRNA sequences.

Authors:  Mohammed Amarzguioui; Hans Prydz
Journal:  Biochem Biophys Res Commun       Date:  2004-04-16       Impact factor: 3.575

Review 3.  Strategies for silencing human disease using RNA interference.

Authors:  Daniel H Kim; John J Rossi
Journal:  Nat Rev Genet       Date:  2007-03       Impact factor: 53.242

4.  Effect of RNA silencing of polo-like kinase-1 (PLK1) on apoptosis and spindle formation in human cancer cells.

Authors:  Birgit Spänkuch-Schmitt; Jürgen Bereiter-Hahn; Manfred Kaufmann; Klaus Strebhardt
Journal:  J Natl Cancer Inst       Date:  2002-12-18       Impact factor: 13.506

5.  Potent and specific genetic interference by double-stranded RNA in Caenorhabditis elegans.

Authors:  A Fire; S Xu; M K Montgomery; S A Kostas; S E Driver; C C Mello
Journal:  Nature       Date:  1998-02-19       Impact factor: 49.962

Review 6.  Promises and challenges in developing RNAi as a research tool and therapy for neurodegenerative diseases.

Authors:  Xu Gang Xia; Hongxia Zhou; Zuoshang Xu
Journal:  Neurodegener Dis       Date:  2005       Impact factor: 2.977

Review 7.  RNA interference technologies and their use in cancer research.

Authors:  Alex Gaither; Vadim Iourgenko
Journal:  Curr Opin Oncol       Date:  2007-01       Impact factor: 3.645

8.  Positional effects of short interfering RNAs targeting the human coagulation trigger Tissue Factor.

Authors:  Torgeir Holen; Mohammed Amarzguioui; Merete T Wiiger; Eshrat Babaie; Hans Prydz
Journal:  Nucleic Acids Res       Date:  2002-04-15       Impact factor: 16.971

9.  Integrated siRNA design based on surveying of features associated with high RNAi effectiveness.

Authors:  Wuming Gong; Yongliang Ren; Qiqi Xu; Yejun Wang; Dong Lin; Haiyan Zhou; Tongbin Li
Journal:  BMC Bioinformatics       Date:  2006-11-27       Impact factor: 3.169

Review 10.  Interfering with disease: a progress report on siRNA-based therapeutics.

Authors:  Antonin de Fougerolles; Hans-Peter Vornlocher; John Maraganore; Judy Lieberman
Journal:  Nat Rev Drug Discov       Date:  2007-06       Impact factor: 84.694

View more
  12 in total

1.  siRNA Design and GalNAc-Empowered Hepatic Targeted Delivery.

Authors:  Mei Lu; Mengjie Zhang; Bo Hu; Yuanyu Huang
Journal:  Methods Mol Biol       Date:  2021

2.  Target gene abundance contributes to the efficiency of siRNA-mediated gene silencing.

Authors:  Sun Woo Hong; Yuanyuan Jiang; Soyoun Kim; Chiang J Li; Dong-ki Lee
Journal:  Nucleic Acid Ther       Date:  2014-02-14       Impact factor: 5.486

3.  SMEpred workbench: A web server for predicting efficacy of chemicallymodified siRNAs.

Authors:  Showkat Ahmad Dar; Amit Kumar Gupta; Anamika Thakur; Manoj Kumar
Journal:  RNA Biol       Date:  2016-09-07       Impact factor: 4.652

4.  Multi-task learning for cross-platform siRNA efficacy prediction: an in-silico study.

Authors:  Qi Liu; Qian Xu; Vincent W Zheng; Hong Xue; Zhiwei Cao; Qiang Yang
Journal:  BMC Bioinformatics       Date:  2010-04-10       Impact factor: 3.169

Review 5.  Integrating the multiple dimensions of genomic and epigenomic landscapes of cancer.

Authors:  Raj Chari; Kelsie L Thu; Ian M Wilson; William W Lockwood; Kim M Lonergan; Bradley P Coe; Chad A Malloff; Adi F Gazdar; Stephen Lam; Cathie Garnis; Calum E MacAulay; Carlos E Alvarez; Wan L Lam
Journal:  Cancer Metastasis Rev       Date:  2010-03       Impact factor: 9.264

6.  HIVsirDB: a database of HIV inhibiting siRNAs.

Authors:  Atul Tyagi; Firoz Ahmed; Nishant Thakur; Arun Sharma; Gajendra P S Raghava; Manoj Kumar
Journal:  PLoS One       Date:  2011-10-11       Impact factor: 3.240

7.  RNAiAtlas: a database for RNAi (siRNA) libraries and their specificity.

Authors:  Slawek Mazur; Gabor Csucs; Karol Kozak
Journal:  Database (Oxford)       Date:  2012-06-14       Impact factor: 3.451

8.  Computational design of artificial RNA molecules for gene regulation.

Authors:  Alessandro Laganà; Dario Veneziano; Francesco Russo; Alfredo Pulvirenti; Rosalba Giugno; Carlo Maria Croce; Alfredo Ferro
Journal:  Methods Mol Biol       Date:  2015

9.  Approximate Bayesian feature selection on a large meta-dataset offers novel insights on factors that effect siRNA potency.

Authors:  Jochen W Klingelhoefer; Loukas Moutsianas; Chris Holmes
Journal:  Bioinformatics       Date:  2009-05-06       Impact factor: 6.937

10.  siRNAmod: A database of experimentally validated chemically modified siRNAs.

Authors:  Showkat Ahmad Dar; Anamika Thakur; Abid Qureshi; Manoj Kumar
Journal:  Sci Rep       Date:  2016-01-28       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.