Literature DB >> 22829744

shRNAPred (version 1.0): An open source and standalone software for short hairpin RNA (shRNA) prediction.

Nishtha Singh, Tanmaya Kumar Sahu, Atmakuri Ramakrishna Rao, Trilochan Mohapatra.   

Abstract

UNLABELLED: The small hairpin RNAs (shRNA) are useful in many ways like identification of trait specific molecular markers, gene silencing and characterization of a species. In public domain, hardly there exists any standalone software for shRNA prediction. Hence, a software shRNAPred (1.0) is proposed here to offer a user-friendly Command-line User Interface (CUI) to predict 'shRNA-like' regions from a large set of nucleotide sequences. The software is developed using PERL Version 5.12.5 taking into account the parameters such as stem and loop length combinations, specific loop sequence, GC content, melting temperature, position specific nucleotides, low complexity filter, etc. Each of the parameters is assigned with a specific score and based on which the software ranks the predicted shRNAs. The high scored shRNAs obtained from the software are depicted as potential shRNAs and provided to the user in the form of a text file. The proposed software also allows the user to customize certain parameters while predicting specific shRNAs of his interest. The shRNAPred (1.0) is open access software available for academic users. It can be downloaded freely along with user manual, example dataset and output for easy understanding and implementation. AVAILABILITY: The database is available for free at http://bioinformatics.iasri.res.in/EDA/downloads/shRNAPred_v1.0.exe.

Entities:  

Keywords:  Gene silencing; RNAi; shRNA; shRNA prediction

Year:  2012        PMID: 22829744      PMCID: PMC3400981          DOI: 10.6026/97320630008629

Source DB:  PubMed          Journal:  Bioinformation        ISSN: 0973-2063


Background

A shRNA is a tight hairpin turn, with a loop of 4–23 nucleotides and a stem (two anti parallel strands) of 19–29 nucleotide base pairs [1]. shRNAs are pivotal in the field of gene silencing as these are cheaper than siRNAs for large-scale studies [2]. However, currently available web-based tools fail to predict shRNAs from a large set of nucleotide sequences. Also, hardly any stand alone software exists in the public domain for this purpose. Hence, the aim of this paper is to develop standalone software, to facilitate prediction of ‘shRNA-like’ regions, by considering an exhaustive list of hairpin parameters, from voluminous genomic sequence data. This software will cater the needs of the researchers and scientists working in the field of RNA interference in designing shRNA.

Methodology

Initially, different properties of shRNA, like, stem and loop lengths, perfect stem complementarity, GC content, melting temperature (Tm), position specific nucleotides and low complexity regions are taken into consideration while developing the script for shRNAPred (version 1.0). The script is developed using Active PERL 5.12.3. Further, the executable file was generated by using Perl Packager (pp) module, provided by Perl Archive Toolkit (PAR) version 0.85_01 of Comprehensive Perl Archive Network (CPAN) in windows environment. In addition, the script is configured with modules for each property and the software uses these modules based on the software options. The parameters GC content and Tm considered in the software are calculated as follows; Calculation of GC content: [(C_count+G_count)*100]/ [(2*(Stem_Length) +Loop length)]; Calculation of Tm a) Tm[oC]=64.9+[(41*(nG+nC-16.4))/(nA+nT+nG+nC)](if length>15) [3]; b) Tm[oC]=2*(nA+nT) + 4*(nG+nC)( if length<=15) Where n= number of nucleotides

Scoring System:

Different scores are assigned for each parameter by considering various favorable and unfavorable properties of shRNA. The favorable properties are assigned with positive scores whereas unfavorable properties are assigned with penalty. For property TM, a score +1 is given when it lies in the range 20°C-60°C [4]. In a similar way, for GC content a score of +1 is given in the range 35% - 60% [5]. A penalty of -1 is added for presence of Poly A or Poly C and +1, otherwise [6]. A penalty -0.2 is added for each complimentary base pair in the loop sequence since more the number of complementary bases the lesser is the chance of the sequence being a loop. Presence or absence of certain nucleotides at specific positions in stem often increases the efficacy of the shRNA [7, 8]. Hence, suitable scores, are given for the properties like A at 3rd position of 5'sense strand (1,0), T at 10th position of 5' sense strand(1,0), G/C is present at 1st position of 5' sense strand (1,-1), A/T at 19th position of 5' sense strand(1,-1), G at 13th position of 5' sense strand(-1,0), T is at 13th position of 5' sense strand(1,0), T at 1st position of 5' antisense strand(2,0) and A/U nucleotide in any of the first five position at 5' antisense strand(1,0); where the first value in every parenthesis ( ) is meant for presence and the second value meant for absence of a nucleotide at a given position on 5' sense / antisense strand. Based on the above set parametric score a total score is computed for each shRNA-like region. Ranks are then assigned to these regions based on magnitude of the score, with rank 1 being given to highest scored region.

Software input / output

Input:

The software accepts the input sequence file in FASTA format. The header line for every sequence should contain the information viz., Gene ID, accession number and definition, separated by pipelines and multiple sequences must be separated by a new line.

Options:

The software provides three different options and the user can choose one option at a time for predicting shRNA from the input.

User defined Stem and loop lengths:

In this choice the user needs to enter stem length and loop length of his choice. Besides, range of GC-content and number of loop-end complementary residues is to be provided by the user under this option.

Predefined Stem and loop length combinations:

In this case, the users don't need to provide the stem and loop lengths, as the stem and loop length combinations viz. 29-4, 19- 9, 19-10, 21-9, 25-10, 19-4, 29-9 and 27-4 Table 1 (see supplementary material) are predefined based on literature. Here also the user is required to enter the range of GC content and number of loop-end complementary residues as given in option - User defined stem and loop lengths.

Specific Loop Sequences (SLS):

This option, initially, prompts the user to enter the stem length. Further, it asks the user to either choose a loop sequence from a set of literature based SLS or define a new SLS. The literature based SLS are TTAA, TTCG, CCACC, CTCGAG, AAGCUU, CCACACC, TTCAAGAGA, AAGTTCTCT, AAGTTCTCT, TTTGTGTAG, GAAGCTTG, CTTCCTGTCA, TCAAGAG, GTGTGCTGTCC, TTCAAGAAC, TTGTGAGA Table 2 (see supplementary material). In addition, this option also prompts the user to enter GC content range as an additional parameter. After a successful execution, the software displays the total number of shRNAs predicted and the name of the generated output file. Figure 1 & Figure 2 show the illustration and flowchart of the software.
Figure 1

Screenshot of software's illustration

Figure 2

Flowchart for shRNAPred (version 1.0); SL-shRNA having Stem and Loop lengths of user's choice (option “1”), SLCshRNA having Stem and Loop length Combinations from literature (option “2”), SLS- shRNA with Stem Length of user's choice with Specific loop sequences (option “3”), GC-range of Guanine-Cytosine content, LTSLS- Literature based Specific Loop Sequences, CER- Complementary End Residues, CRO- Choose Right Option, DNM- Does Not Matter, SF- re-execution with Same File, NF- re-execution with New File , Y- Yes, N- No.

Output:

The output file is created in the directory where the input file exists. It is a tab delimited text file containing the shRNA information, viz. stem length, loop length, shRNA sequence, position of shRNA in the input sequence, GC content, Tm, shRNA score, presence of polyAs and PolyCs in both positive and negative strands, number of complimentary nucleotides in loop sequence, accession number of the mRNA, gene ID and definition of the mRNA sequence. All the columns of the output can easily be imported in excel, access and MySQL databases. The program also generates a log file containing the sequence accessions having no shRNA complements.

Caveat and future development

In the next version, web-based software will be developed with an additional module on prediction of shRNAs from all possible stem and loop length combinations.
  7 in total

Review 1.  Killing the messenger: short RNAs that silence gene expression.

Authors:  Derek M Dykxhoorn; Carl D Novina; Phillip A Sharp
Journal:  Nat Rev Mol Cell Biol       Date:  2003-06       Impact factor: 94.444

2.  Rational siRNA design for RNA interference.

Authors:  Angela Reynolds; Devin Leake; Queta Boese; Stephen Scaringe; William S Marshall; Anastasia Khvorova
Journal:  Nat Biotechnol       Date:  2004-02-01       Impact factor: 54.908

3.  A Web-based design center for vector-based siRNA and siRNA cassette.

Authors:  Luquan Wang; Forest Y Mu
Journal:  Bioinformatics       Date:  2004-03-04       Impact factor: 6.937

4.  Shortcomings of short hairpin RNA-based transgenic RNA interference in mouse oocytes.

Authors:  Lenka Sarnova; Radek Malik; Radislav Sedlacek; Petr Svoboda
Journal:  J Negat Results Biomed       Date:  2010-10-12

5.  Hybridization of synthetic oligodeoxyribonucleotides to phi chi 174 DNA: the effect of single base pair mismatch.

Authors:  R B Wallace; J Shaffer; R F Murphy; J Bonner; T Hirose; K Itakura
Journal:  Nucleic Acids Res       Date:  1979-08-10       Impact factor: 16.971

6.  Cation-dependent transition between the quadruplex and Watson-Crick hairpin forms of d(CGCG3GCG).

Authors:  C C Hardin; T Watson; M Corregan; C Bailey
Journal:  Biochemistry       Date:  1992-01-28       Impact factor: 3.162

7.  Integrated siRNA design based on surveying of features associated with high RNAi effectiveness.

Authors:  Wuming Gong; Yongliang Ren; Qiqi Xu; Yejun Wang; Dong Lin; Haiyan Zhou; Tongbin Li
Journal:  BMC Bioinformatics       Date:  2006-11-27       Impact factor: 3.169

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.