Literature DB >> 21853133

Designing of highly effective complementary and mismatch siRNAs for silencing a gene.

Firoz Ahmed1, Gajendra P S Raghava.   

Abstract

In past, numerous methods have been developed for predicting efficacy of short interfering RNA (siRNA). However these methods have been developed for predicting efficacy of fully complementary siRNA against a gene. Best of author's knowledge no method has been developed for predicting efficacy of mismatch siRNA against a gene. In this study, a systematic attempt has been made to identify highly effective complementary as well as mismatch siRNAs for silencing a gene.Support vector machine (SVM) based models have been developed for predicting efficacy of siRNAs using composition, binary and hybrid pattern siRNAs. We achieved maximum correlation 0.67 between predicted and actual efficacy of siRNAs using hybrid model. All models were trained and tested on a dataset of 2182 siRNAs and performance was evaluated using five-fold cross validation techniques. The performance of our method desiRm is comparable to other well-known methods. In this study, first time attempt has been made to design mutant siRNAs (mismatch siRNAs). In this approach we mutated a given siRNA on all possible sites/positions with all possible nucleotides. Efficacy of each mutated siRNA is predicted using our method desiRm. It is well known from literature that mismatches between siRNA and target affects the silencing efficacy. Thus we have incorporated the rules derived from base mismatches experimental data to find out over all efficacy of mutated or mismatch siRNAs. Finally we developed a webserver, desiRm (http://www.imtech.res.in/raghava/desirm/) for designing highly effective siRNA for silencing a gene. This tool will be helpful to design siRNA to degrade disease isoform of heterozygous single nucleotide polymorphism gene without depleting the wild type protein.

Entities:  

Mesh:

Substances:

Year:  2011        PMID: 21853133      PMCID: PMC3154470          DOI: 10.1371/journal.pone.0023443

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

RNA interference (RNAi) is a natural mechanism evolved in complex organisms to regulate the gene expression. This mechanism also provide defense against viruses and transposable material to maintain the genome integrity [1]. There has been increasing interest to harness this mechanism to silence a specific mRNA. RNAi is triggered whenever a cell encounter long dsRNA molecules and subsequently cleave them into small interfering RNAs (siRNAs) using Dicer enzyme. siRNA is ∼21 nucleotide (nt) long dsRNA having 2 nt overhang on 3′-end. Afterward, siRNA unwound and one strand associated with nuclease-containing protein complex (RISC). Subsequently RISC containing siRNA bind to the complementary mRNA and promotes cleavage/degradation of mRNA [2]. siRNAs have become an important tool for silencing gene of interest and have emerging as potential therapeutics. The beauty of the system that makes it a powerful tool lies in sequence specificity towards particular gene, its quick effect, and cost effectiveness. Importantly, it makes feasible for large-scale functional genomics studies. It has been shown that knockdown effect (efficacy) of siRNA is varying according to target site on mRNA and hence, very limited set of siRNAs show high efficacy [3]. Huesken et al. analyzed experimental data to understand relationship between the siRNA sequence and its silencing effect on 34 mRNA species [4]. They also developed an Artificial Neural Network (ANN) based method BIOPREDsi and achieved maximum correlation 0.66 between actual and predicted efficacy [4]. In past, number of methods have been developed for predicting efficacy of siRNA [5], [6], [7], [8], [9]. In a recent study, performance of various methods have been evaluated which showed BIOPREDsi, ThermoComposition21 and DSIR are highly accurate and reliable methods [8], [10]. Initially it was believed that full complementary siRNA is needed to silence a target gene. However, studies have shown that siRNA behaves like miRNA and suppress protein synthesis when it is not fully complementary to the target, indicating mismatches are allowed during target selection by siRNA [2], [11]. This phenomenon also raised very important problem about off-target effect where unintended target genes suppressed by siRNA [12], [13], [14]. A study indicates that seed region of siRNA, 2-8 nt from 5′-end, is important for target finding and single mismatch within seed region can change the off-target transcripts without effecting silencing efficiency of original target transcript [14]. Initially off-target sequences were searched using similarity based methods against mRNA sequence database but the strategy was not successful due to lack of knowledge about level of sequence similarity required for off-target effect. To understand the silencing effect of mismatch between siRNA and target, several studies were conducted [15], [16], [17], [18], [19], [20], [21]. The study by Du et al. reveals position of the mismatch generated in the target influence silencing and categorized them as; (a) High tolerance: mismatch at position 1, 2, 18, or 19, which does not affect the efficacy. (b) Low tolerance: mismatch at position 5-11 which results into abolishing the RNAi activity and remain position is (c) of moderate tolerance [15]. It also showed the impact of mismatched nucleotide and found A:C and G:U are well tolerated mismatch. Furthermore the silencing effect of double-nucleotide mismatches were also studied [17]. Recently, a very systematic study was conducted by using 20 siRNAs against 400 various mismatched targets to generate a model for single nucleotide-mismatch [21]. This study analyzed all combinations of mismatched siRNA:target and demonstrated that efficacy can be influenced by position and type of nucleotide mismatched. The work also demonstrated that most tolerant mismatch was A:C while least one was A:G in term of siRNA:target. It was observed that swapping of mismatched nucleotides at some position dramatically changed the efficacy e.g. at position 17 of siRNA both A:C and C:A mismatched are well tolerated while at position 12 only A:C mismatch is tolerated not C:A. However, study also demonstrated the importance of creating mismatch between sense and antisense strand of siRNA in order to make more asymmetric siRNA which leads to improve silencing efficacy [20]. In order to find off target sequence, methods has been developed which incorporate features like seed complementary region and nucleotide mismatch to predict potential off-targets [22]. To the best of author's knowledge, lack of specificity of siRNA is considered as major drawback in designing any siRNA based therapy. Investigation indicates that a large portion in mRNA could not be targeted for siRNA because of having low efficacy [3]. Thus, it makes limited choice for selecting target site. Furthermore, the requirement to enhance efficacy of a siRNA against particular target site is not fulfilled by available methods. In this study, we have examined whether weakness of siRNA (poor specificity) can be exploited to design mutant siRNA of desired efficacy. It is well known that all siRNAs is not equally effective even if they are fully complementary to mRNA. On the other side, we also know from experimental studies that few mismatches at specific position can be tolerated. Based on this hypothesis a prediction method has been developed for designing effective mismatch siRNA against mRNA. This study having two sections: (1) The development of a model for predicting siRNA efficacy, and (2) The creation of mutation in the siRNA sequence to enhance its efficacy. This facility is accessible to scientific community through web based portal at http://www.imtech.res.in/raghava/desirm/.

Methods

Datasets

The main dataset used in this study contains 2182 siRNAs. All models trained, tested and evaluated using five-fold cross-validation techniques on main dataset. This dataset was obtained from Huesken et al. [4] and have been used for developing number of existing methods. In order to compare performance of our method with existing methods, we obtained benchmarking data from Ichihara et al. [8]. This benchmarking data contains two datasets; I) training dataset having 2431 siRNAs [consist of 2182 (main dataset) + 249 (testing dataset)] taken from [4] and ii) testing dataset consists of 419 siRNAs [23], [24], [25], [26], [27].

Features used for models development

Composition based features

Nucleotide composition: The nucleotide composition determines the occurrences of different types of nucleotides, dinucleotides, trinucleotide etc. We compute mono-, di,- tri-, and tetra-nucleotide composition of siRNAs that generate vector of 4 (A, C, G, and U), 16 (AA, AC, AG, CG, AU,…, UU), 64 (AAA, AAC, AAG,…, UUU), and 256 (AAAA, AAAC, AAAG,…,UUUU) respectively. Split nucleotide composition: In this case whole sequence was divided into two equal parts and nucleotide composition of each part is calculated separately. Composition of both part is used to develop our models, in this case dimension of input vector was doubled [28]. For instance 21 nt sequence was divided into nearly half 11 nt and 11 nt, mononucleotide composition was calculated for each part and combine to form vector dimensions of 8. Higher order nucleotide composition: In simple dinucleotide composition we considered local order (1st order) where interaction between ith and (i + 1)th nucleotide is taken into account. In case of second order dinucleotide composition, interaction of 1st with 3rd nucleotide is considered i.e. ith and (i + 2)th. Similarly in case of third order dinucleotide composition interaction of 1st with 4th nucleotide is considered.

Position specific features

Binary pattern of nucleotides: This gives information about occurrences of position specific nucleotide in siRNA sequence. In this case each nucleotide was represented by binary pattern of dimensions four (A by [1,0,0,0], C by [0,1,0,0], G by [0,0,1,0] and U by [0,0,0,1]). Thus, a sequence of 21 nucleotides of miRNA was represented by a vector of dimensions 84 (4×21). Binary pattern of dinucleotides: Instead of considering one nucleotide as in binary pattern, occurrence of two consecutive nucleotides at particular position was considered. Binary of condense: Sequence was divided into two equal parts and binary pattern of both part were calculated and merged into each other (like hairpin structure) so that 5′-end and 3′-end of a sequence are at same position. Hydrogen bond: The hydrogen bonding properties were depicted as “3” for G and C while “2” was assigned in case of A and U. Thermodynamic: The value of thermodynamic propertied at each position were taken from [10]. Target site accessibility: Target site accessibility in terms of probability of being unpaired is calculated using RNAplfold [29]. We used parameter (W = 80, L = 40, u = 16) for calculating target site accessibility which was considered as the best parameters for differentiating between functional and non-functional siRNA [6]. Scaling of feature: During hybrid approach various different features were considered at a time creating a large range of feature values that resulted into the poor performance of models [30]. Hence we normalized the values in the range of 1–10 using scaling feature of libSVM software (http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf).

Prediction approaches

In order to develop models for siRNA efficacy prediction, various features of siRNAs were used. SVMlight [31], was implemented for models development.

Performance measures

In order to evaluate performance of our models, we used following standard parameters; 1) correlation coefficient (R), II) coefficient of determination (R2), III) mean absolute error (MAE), root mean squared error (RMSE). All models were evaluated using five-fold cross validation technique. Where n is the size of test set, Ei pred, and Ei act is the predicted and actual efficacy respectively. is the average of actual efficacy in test set.

Results

All models were trained and tested on main datasets consist of 2182 siRNAs where each siRNA is 21 nucleotides long. In this strategy dataset was randomly divided into five sets, four sets were used for training and remaining set for testing. This process is repeated five times so that each set is used once for testing.

Composition based models

First, SVM based models have been developed using different types of nucleotide compositions and obtained maximum correlation of 0.574 between predicted and actual efficacy using tetranucleotide composition. The performance of all composition-based models has been shown in Table 1. Similarly, SVM based models have been developed using split nucleotide composition and achieve maximum correlation of 0.508 using trinucleotide composition. Finally models were developed using higher order nucleotide composition and achieved best correlation 0.579 between predicted and actual efficacy. In comparison to simple trinucleotide composition (R = 0.574), substantial increase in efficacy was observed using 2nd order trinucleotide composition reveals the importance of pattern of nucleotides and influence of single gap on efficacy (Table 1).
Table 1

Performance of SVM-based models for siRNA efficacy prediction developed using composition based features.

CompositionFeaturesVectorRR2MAERMSEgcj
Nucleotide Composition Mono40.3160.0950.1520.1900.00111
Di160.4500.1450.1450.1850.00132
Tri640.5150.2480.1380.1730.00112
Tetra2560.5740.3120.1310.1660.0001102
Split nucleotide composition Mono80.355-0.030.1610.2030.00113
Di320.4530.2030.1430.1780.00113
Tri1280.5080.2430.1370.1740.000122
Higher order composition 2nd order Di160.4200.1150.1490.1880.00112
3rd order Di160.4670.2070.1430.1780.00111
4th order Di160.4610.1500.1460.1840.00112
2nd order Tri 64 0.579 0.332 0.128 0.163 0.001 1 1
3rd order Tri640.4830.2180.1410.1770.00111
2nd order Tetra2560.5020.2220.1390.1760.0001102

Mono: mononucleotide; di: dinucleotide; tri: trinucleotide; tetra: tetranucleotide; R: correlation coefficiet; R2: Coefficient of determination; MAE: Mean absolute error; RMSE: Root mean square of error; g, c, and j are SVM parameters.

Mono: mononucleotide; di: dinucleotide; tri: trinucleotide; tetra: tetranucleotide; R: correlation coefficiet; R2: Coefficient of determination; MAE: Mean absolute error; RMSE: Root mean square of error; g, c, and j are SVM parameters.

Models based on position specific features

One of major disadvantage of above composition based models is that they used only frequency of different types of nucleotides and hence do not consider the information about position of nucleotides in siRNA. In order to overcome this problem we created binary patterns for siRNA, which provide complete information (position and type of nucleotide). First SVM based model was developed using binary pattern of nucleotide composition and achieve correlation coefficient of 0.637 (Table 2). This model outperforms all the models based on composition, which indicate importance of position of nucleotides in siRNA. As shown in Table 2, SVM models were developed using various types of binary patterns like dinucleotide, hydrogen bond. However, we got maximum performance using binary pattern of nucleotides.
Table 2

Performance of SVM-based model for siRNA efficacy prediction developed using position specific feature and our method desiRm.

FeaturesVectorRR2MAERMSEgcj
Binary pattern840.6370.4060.1220.1540.0111
Binary of di3200.5630.2720.1350.1700.00162
Binary of Condense400.4490.2000.1420.1790.001101
AU, GC420.3620.1300.1490.1860.00111
Hydrogen bond210.5790.3350.1300.1630.0121
Thermodynamics190.5770.3320.1290.1630.001101
desiRm21 168 0.670 0.448 0.118 0.148 0.001 2 1

Hybrid models

In this study we developed models using two or more than two types of features and called Hybrid models. First hybrid models were developed using composition based features where two or more than two types of compositions were used for developing models (Table S1). Similarly, we developed hybrid models using position specific features; we found binary pattern and thermodynamics achieved better performance (Table S1). We also developed hybrid models using percent nucleotide composition, nucleotide frequency and binary pattern as input feature. Finally, we achieved highest correlation coefficient of 0.670 by using our hybrid model, which uses nucleotide frequency and position specific based features (Mono+Di+Tri+Binary pattern). We called this model desiRm21 in this study (Table 2).

Comparison with existing methods

It is important to compare performance of newly developed method with existing methods. In order to compare any two methods, one should use same dataset for training and testing. Recently, Ichihara et al. [8] compare performance of major existing methods. In this study we used same data for evaluating performance of our newly developed method desiRm21. We trained our model on 2431 siRNAs and tested on 419 siRNAs. As shown in Table 3, performance of desiRm21 is comparable to previously developed methods.
Table 3

Performance of desiRm21 and other four algorithms on test dataset containing 419 siRNA.

MethodsRR2MAERMSE
i-Score0.5570.2170.2430.284
s-Biopredsi0.5460.2960.2180.270
Thermocomposition210.5770.2000.2210.288
DSIR0.5550.1580.2220.295
desiRm21 0.5580.1640.2220.294

Increase of siRNA efficacy by base substitution

The siRNA pathway is a multistep procedure and one crucial step is the integration of the guide strand into the RISC complex. The efficiency of integration depends on the sequence of siRNA duplexes, but likely not on the sequence of the target sites itself [20], [30]. Here in this section we propose an ingenious approach to design non-perfect siRNAs, which are more efficient in the earlier steps of the process such as RISC integration resulting more potent siRNAs. RNAi studies in human cells showed effective siRNAs may have length from 16 to 21 nt [32], siRNA of length of 19 nt have been successfully used to silent mRNAs [23], [24], [27], [33]. Previously, it has been shown that performance of siRNA prediction method developed using 19 nt is very similar to method developed using 21 nt [8]. In order to understand the effect of mismatch between siRNA and target, first time a systematic experimental analysis was conducted by Liang's group [15], [17]. They used 19 nt long siRNA for targeting human CD46 gene (XM_036622) at nucleotides position 604–622. In order to get more insight on single-nucleotide mismatch, same group studied all combinations of base-mismatch across each position on target sites [21]. They employed 20 siRNAs against ∼400 target sites and generate most comprehensive data on efficacy of single mutation on target site. Hence, for implementing the result of these studies we developed a SVM model desiRm19. This model uses same nucleotide features as with desiRm21 but on 19 nt long sequence, which were made by removing last two bases from 3′-end of each 21 nt long sequence [34]. desiRm19 achieved correlation coefficient of 0.646, 0.648, and 0.553 on training dataset, independent datasets of 249- and 419-sequences respectively. The performance of desiRm19 is marginally lower than desiRm21 because of less information content on 19 nt long sequence. In past, several investigations reported the importance of target site accessibility in mRNA to design effective siRNA. Hence, we also integrated target site accessibility feature along with nucleotide frequency and binary pattern feature (desiRm19) for model development. The best SVM model (desiRm) achieved correlation coefficient of 0.647 and 0.654 on training dataset (2182-sequences) and independent dataset (249-sequences) respectively. The marginal improvement in the performance was observed due to incorporating target site accessibility information. This supports earlier finding about the importance of this feature in designing functional siRNA [6], [35]. In order to get more potent siRNA, we generated mutation on every position of 19 nt antisense with all four nucleotides of an siRNA. Efficacies of these mutated siRNAs were predicted using our SVM model desiRm. However, the mutant siRNA when bound with target sequence caused mismatch and hence affected the silencing efficiency. Therefore, based on the experimental data a scoring method has been developed, which deduced effect of position and/or identity of mismatch from predicted efficacy to find out overall Mismatch Efficacy (ME) of mutated siRNA. ME =  predicted efficacy- Σ reduced efficacy (due to mismatch)

Mismatch efficacy incorporating both position and identity of nucleotide

Initially we generated the single mutation that makes 57 different permutation of single siRNA. The repression changes affected by position of mismatch and identity of mismatch between siRNA:target is taken from experimental data [21]. We obtained the mismatch tolerance efficacy data by personal communication with author (Figure S4 of [21]). Therefore a mismatch efficacy is calculated by deducing efficacy due to mismatch from predicted efficacy. Suppose an siRNA (CAGUGGAAAGUACAUCAGA) is made against a target region (UCUGAUGUACUUUCCACUG) in a mRNA NM015213. The siRNA is fully complementary with target having actual and predicted efficacy of 0.479 [4] and 0.588 respectively (Figure 1). If C is replaced by U at 1th position (uAGUGGAAAGUACAUCAGA) its predicted efficacy will be 0.776, when it is fully complementary with target sequence. But as we noticed that first base of siRNA causes mismatch of U:G and causes decrease in efficacy by 0.066 (from [21]). Thus mismatch efficacy of uAGUGGAAAGUACAUCAGA is 0.776-0.066  =  0.710. To find out efficacy of siRNA with two mismatches experimental data were taken from Dahlgren et al., [17]. Thus, subsequent mutations resulting siRNA of uAGUGGAAAGUACAaCAGA with mismatch efficacy of 0.929.
Figure 1

Schematic diagram of efficacy of complementary and mismatch siRNAs against a target site.

Fully complementary siRNA has actual and predicted efficacy of 0.479 and 0.588 respectively. Single mutation at 1st position in the siRNA has predicted efficacy of 0.776 but overall efficacy due to single mismatch is 0.710 (0.776-0.066). Further mutation at 15th position in siRNA has predicted mismatch efficacy of 0.929. Base pairing is denoted by “ | ”, mismatch with “ : ”, and mutant base with small case.

Schematic diagram of efficacy of complementary and mismatch siRNAs against a target site.

Fully complementary siRNA has actual and predicted efficacy of 0.479 and 0.588 respectively. Single mutation at 1st position in the siRNA has predicted efficacy of 0.776 but overall efficacy due to single mismatch is 0.710 (0.776-0.066). Further mutation at 15th position in siRNA has predicted mismatch efficacy of 0.929. Base pairing is denoted by “ | ”, mismatch with “ : ”, and mutant base with small case.

Mismatch efficacy incorporating only position effect

The experimental studies carried out by Dahlgren et al. only used single siRNA against mutated targets [17]. Thus, the effects of all possible types of siRNAs and double nucleotides mismatches were not studied. The experimental data has 709 different combinations for double nucleotide mismatch out of 1539 possible. Therefore, in case of more than two mismatches or lack of similarity with experimental data we only incorporated average position specific effect from single-nucleotide mismatch [21]. For position specific mismatch, average effect of that position was considered (Figure S1).

Description of web server

A user-friendly webserver has been developed on SUN server under Solaris environment using HTML, PERL, and CGI-PERL. There are two input fields; (1) submit mRNA: effective siRNA can be detected against the mRNA (Figure S2). The output result is in descending order of efficacy that contains sequences of antisense with fully complementary target sequence, position in mRNA, target site accessibility and its efficacy (Figure S3). If one wants to further increase the potency of siRNA then more efficacious antisense sequence can be clicked which is submitted automatically to generate single mutant siRNAs and rank them according to ME efficacy (Figure S4). The output result shows the position of mismatch, mutated nucleotide, target site accessibility as well as targets sequence. The increase in efficacy using mutation can also be obtained directly by using second input field, (2) submit siRNA, where user can put its 19 nt long antisense siRNA generated from other software and its target sequence. However, this field did not consider the target site accessibility feature during efficacy calculation. Further mutations in siRNA can be generated by clicking the antisense sequence. This strategy can also be used to generate siRNA with very low efficacy against an off-target.

Comparison of efficacy due to mismatch

An analysis was carried out to assess the effect on efficacy due to mismatch between siRNA and target sequence. We considered mutant siRNAs sequence against a particular target. By using our server it was found that 1–2 mutation can be use to reverse the efficacy of a siRNA from ineffective to effective and vice-versa but need experimental verification (Table 4). Therefore, in order to evaluate real performance of desiRm, we evaluated its performance on experimentally verified 78 mutated siRNA, taken from Ohnishi et al. [36]. In this study, they design allele specific siRNA to degrade mutant mRNA of human Prion Protein (PRNP) gene without depleting wild type transcript (Figure S5). They utilized same strategy which we are proposing, i.e. targeting same site with different siRNAs (each siRNA having one-base substitution at different position) to manipulate the efficacy of siRNA and to get those siRNAs which can better discriminate between mutant and wild type target. Thus siRNA give rise to single-nucleotide mismatch with mutant-target while two-nucleotide mismatch with wild-type. They reported that introducing base-substitution at specific position in siRNA depleted the mutant transcript while least affected on wild-type. When we predict the efficacy of siPrnp 102 (T9) by desiRm a correlation coefficient of 0.725 was achieved between actual and predicted efficacy (Table 5). This high correlation supports the applicability of our tool in real life. Furthermore, we also used desiRm on another set of siRNA data and achieved correlation coefficient of 0.586, 0.607 and 0.666 between actual and predicted efficacy for siPrnp105(T10), siPrnp102(T10) and siPrnp178(A9) respectively (See Table S2, S3, S4).
Table 4

Comparative study of increase/decrease efficacy of siRNAs by using our method, desiRm.

siRNA antisenseTarget accessActual EfficacydesiRm EfficacyMutated siRNA antisensedesiRm EfficacyPosition of Mutation
UCCUCACCAUCCGUCCAGU0.0038950.4650.577UCCUCACCcUCCGUCCAGg0.7719, 19
CUAAUAUGUUAAUUGAUUU0.0546830.4620.647CUAAUAUGUUAAUUGAUUg0.81319
CUAAUAUcUUAAUUGAUUg0.8558,19
uUAAUAUGUUAAUUGAUUg0.9091,19
CAGAUUCCACACCAUGUGG0.0003270.4020.732uAGAUUCCACACCAUGUGG0.8641
aAGAUUCCACACCAUGUGG0.9231
uAGAUUCCACACCAaGUGG1.0331,15
uAGAUUCCACACCAcGUGG0.1481, 15
uAGAUcCCACACCAUGUGG0.0611,6
GGUCCACAUUCUAUUUUAA0.0075700.3880.397aGUCCACAUUCUAUUUUAA0.6281
uGUCCACAUUCUAUUUUAg0.7981, 19
uGUCCACAUUCUAUUUUcg0.7571, 18, 19
CCUCACCAUCCGUCCAGUA0.0028530.3260.473aCUCACCAUCCGUCCAGUA0.6531
uCUCACCAUCCGUCCAGUg0.7601, 19
UGUCUACAAUCCACUGUGU0.0084370.9930.878UGUCUACAAaCCACUGUGU0.18810
UGUCUACAuUCCACUGUGU0.0389
AACUUCUUGGCUUUGUACU0.0239260.9950.895AACUUCUUGuCUUUGUACU0.22810
AACAGCUCCGGAUUCUGUG0.0003210.9780.926AACAGCUCCGGAUaCUGUG0.27314
AACAGCUCCcGAUUCUGUG0.26010
AACAGCUCCGGAUUaUGUG0.18915
UAGAAAUGCACACAUCACC0.0016010.9471.019UAGAAAUGCACAaAUCACC0.34313
AAAACUUCACUACAAAUUC0.0084970.9670.914AAAACUUCuCUACAAAUUC0.0839
AAAACUUCAaUACAAAUUC0.02710

Sequence taken from Huesken data, mutated nucleotide is denotes in lower case. Target access: probability of being unpaired at target site calculated by RNAplfold.

Table 5

Assessment of desiRm on experimentally verified mismatched siRNAs of siPrnp102(T9).

Name of siRNAsiRNA sequence (antisense) Mutated sequence# Mismatch (mRNA)Target sequencesiRNA:Target (base mismatch position on siRNA)Actual EfficacyPredicted efficacy
siPrnp102(T9)UGGCUUACUCAGCUUGUUC0 (mutant)GAACAAGCUGAGUAAGCCA00.9720.942
siPrnp102(T9)-5UUGGCUUACUCAGCUaGUUC1(mutant)GAACAAGCUGAGUAAGCCAA:A(15)0.9530.199
siPrnp102(T9)-6UUGGCUUACUCAGCaUGUUC1(mutant)GAACAAGCUGAGUAAGCCAA:A(14)0.8640.267
siPrnp102(T9)-7CUGGCUUACUCAGgUUGUUC1(mutant)GAACAAGCUGAGUAAGCCAG:G(13)0.8670.531
siPrnp102(T9)-12CUGGCUUAgUCAGCUUGUUC1(mutant)GAACAAGCUGAGUAAGCCAG:G(8)0.9310.645
siPrnp102(T9)-13AUGGCUUuCUCAGCUUGUUC1(mutant)GAACAAGCUGAGUAAGCCAU:U(7)0.9510.821
siPrnp102(T9)-14UUGGCUaACUCAGCUUGUUC1(mutant)GAACAAGCUGAGUAAGCCAA:A(6)0.9490.571
siPrnp102(T9)-15UUGGCaUACUCAGCUUGUUC1(mutant)GAACAAGCUGAGUAAGCCAA:A(5)0.9640.720
siPrnp102(T9)-16CUGGgUUACUCAGCUUGUUC1(mutant)GAACAAGCUGAGUAAGCCAG:G(4)0.8500.664
siPrnp102(T9)-17GUGcCUUACUCAGCUUGUUC1(mutant)GAACAAGCUGAGUAAGCCAC:C(3)0.9410.782
siPrnp102(T9)UGGCUUACUCaGCUUGUUC1 (wt)GAACAAGCCGAGUAAGCCAA:G (11)0.7630.450
siPrnp102(T9)-5UUGGCUUACUCaGCUaGUUC2 (wt)GAACAAGCCGAGUAAGCCAA:A(15)/A:G (11)0.5130.150
siPrnp102(T9)-6UUGGCUUACUCaGCaUGUUC2 (wt)GAACAAGCCGAGUAAGCCAA:A(14)/A:G (11)0.4030.134
siPrnp102(T9)-7CUGGCUUACUCaGgUUGUUC2 (wt)GAACAAGCCGAGUAAGCCAG:G(13)/A:G (11)0.4000.033
siPrnp102(T9)-12CUGGCUUAgUCaGCUUGUUC2 (wt)GAACAAGCCGAGUAAGCCAG:G(8)/A:G (11)-0.0410.143
siPrnp102(T9)-13AUGGCUUuCUCaGCUUGUUC2 (wt)GAACAAGCCGAGUAAGCCAU:U(7)/A:G (11)0.1830.286
siPrnp102(T9)-14UUGGCUaACUCaGCUUGUUC2 (wt)GAACAAGCCGAGUAAGCCAA:A(6)/A:G (11)-0.1350.176
siPrnp102(T9)-15UUGGCaUACUCaGCUUGUUC2 (wt)GAACAAGCCGAGUAAGCCAA:A(5)/A:G (11)0.3880.217
siPrnp102(T9)-16CUGGgUUACUCaGCUUGUUC2 (wt)GAACAAGCCGAGUAAGCCAG:G(4)/A:G (11)0.1260.265
siPrnp102(T9)-17GUGcCUUACUCaGCUUGUUC2 (wt)GAACAAGCCGAGUAAGCCAC:C(3)/A:G (11)-0.0630.178

siPrnp102(T9) and its various mutant siRNAs were targeted against prion protein genes (PRNP) and its mutant allele (PRNP-P102L). Mutated base in siRNA is denoted by small letter while mismatch base between siRNA and target are denoted by bold letter. Data of actual efficacy of siRNAs were taken from experimental work reported by Ohnishi et al [36]. Predicted efficacy denotes efficacy of desiRm. All sequences are in 5′ to 3′ direction. Correlation coefficient between actual and predicted efficacy is R = 0.725.

Sequence taken from Huesken data, mutated nucleotide is denotes in lower case. Target access: probability of being unpaired at target site calculated by RNAplfold. siPrnp102(T9) and its various mutant siRNAs were targeted against prion protein genes (PRNP) and its mutant allele (PRNP-P102L). Mutated base in siRNA is denoted by small letter while mismatch base between siRNA and target are denoted by bold letter. Data of actual efficacy of siRNAs were taken from experimental work reported by Ohnishi et al [36]. Predicted efficacy denotes efficacy of desiRm. All sequences are in 5′ to 3′ direction. Correlation coefficient between actual and predicted efficacy is R = 0.725.

Discussion

It is well known that final outcome of siRNA efficacy is the contribution of efficacy gain at each step of RNAi pathway from loading of guide strand into RISC, target accessibility, and cleavage efficiency [23], [30], [37], [38]. However, their degree of contribution is not fully known. Taken together these studied indicate that there are rooms to make mutations in siRNA which become more accessible to different proteins involved in RNAi pathway to enhance the silencing effect. In past, various regression methods were developed to predict the efficacy of siRNA using large experimental data. But there is lack of method that can design the highly effective siRNA by generating mismatch between siRNA and target sequence. The principle of our method is to design siRNAs, which gain efficacy at various steps of RNAi pathways and at last step, silencing, incorporate the mismatch effect with target site. Here first we have developed robust SVM model for efficacy prediction of siRNA using nucleotide features. Although we got similar performance of our method, desiRm, as other methods but extensive improvement of performance was not possible even using other various nucleotide features. Several studies indicated that target site accessibility can improve the siRNA efficacy [6], [35], [39]. Thus we integrated the target site accessibility feature along with nucleotide features and achieved marginally better performance of model. This final model was implemented with mismatched-tolerance data. In the mismatch efficacy prediction we have incorporated both position as well as identity of nucleotide for single, double-nucleotide mismatch taken from experimental data [17], [21]. Dahlgren et al. only used single siRNA in their study, thus all possible combination of siRNA and double-nucleotide mismatch was not covered. Therefore, in case of more than two mismatches or lack of similarity with experimental data we only incorporate average position specific effect from single-nucleotide mismatch [21]. A previous method developed specificity score to find out off-target genes but only considered position specific effect from single-nucleotide mismatch data from Du el al. [15], [22]. However, Du et al. studied the effect of 57 combinations of mismatch while 219 combinations of mismatched out of 228 was covered by Huang et al. across all target position [15], [21]. Thus we implemented most comprehensive data of Huang et al. in desiRm. Several studies showed the importance of mismatch siRNA for targeting disease associated SNP genes without effecting the normal gene [20], [21], [36], [40], [41]. Performance of our method on experimental data showed better correlation coefficient on mismatch efficacy (R = 0.725) than that of SVM model (R = 0.647) indicating usefulness of desiRm for predicting mutant siRNA.

Conclusions

In this study we have developed a method to design siRNA against fully complementary as well as partial complementary region. This novel method helps to make siRNA of desired efficacy without changing the target site. This is very important because some region in mRNA can be best candidate because of having least similarity with non-intended mRNA but at same time having lowest efficacy. Furthermore, our method helpful to design siRNA against SNP associated disease causing gene and mutation prone virus like HIV. Position specific effect on efficacy due to single-nucleotide mismatch. Position 1,2,3, 18 and 19 were highly tolerable i.e. efficacy is least affected. (PDF) Click here for additional data file. Snapshot of desiRm input field where mRNA can be submitted to get siRNAs. (JPG) Click here for additional data file. Snapshot of desiRm output result with fully complementary siRNAs. Each row contains sequence of siRNA, target position, target sequence and accessibility with predicted efficacy. To improve the efficacy of 197th siRNA targeting on 164th position (highlighted), click this sequence. (JPG) Click here for additional data file. Snapshot of desiRm output result with single-mutated siRNAs. Each row contains mutated siRNA, position of mutation, type of mutation, target sequence and accessibility, with predicted efficacy. First sequence (WT) is original, mutation at 1st position in siRNA increase their efficacy to 0.710. Further improvement could be achieved by click on siRNA. (JPG) Click here for additional data file. Complete CDS of prion protein (PRNP) gene (wild type). The nucleotides in bold and red color indicate the position of nucleotide variation in mutant genes reported. Mutant PRNP-P102L has mutation at position 377(C→U); mutant PRNP-P105L has mutation at position 386(C→U); mutant PRNP-D178N has mutation at position 564(G→A). Highlighted regions are targeted by siRNAs in both wild type and mutants by Ohnishi et al. (PDF) Click here for additional data file. Performance of SVM-based model for siRNA efficacy prediction developed using hybrid of features. (DOCX) Click here for additional data file. Assessment of desiRm on experimentally verified mismatched siRNAs of siPrnp105(T10). siPrnp105(T10) and its various mutant siRNAs were targeted against prion protein genes (PRNP) and its mutant allele (PRNP-P105L). Mutated base in siRNA is denoted by small letter while mismatch base between siRNA and target are denoted by bold letter. Data of actual efficacy of siRNAs were taken from experimental work reported by Ohnishi et al. Predicted efficacy denotes efficacy of desiRm. All sequences are in 5′ to 3′ direction. Correlation coefficient between actual and predicted efficacy is R = 0.586. (DOCX) Click here for additional data file. Assessment of desiRm on experimentally verified mismatched siRNAs of siPrnp102(T10). siPrnp102(T10) and its various mutant siRNAs were targeted against prion protein genes (PRNP) and its mutant allele (PRNP-P102L). Mutated base in siRNA is denoted by small letter while mismatch base between siRNA and target are denoted by bold letter. Data of actual efficacy of siRNAs were taken from experimental work reported by Ohnishi et al. Predicted efficacy denotes efficacy of desiRm. All sequences are in 5′ to 3′ direction. Correlation coefficient between actual and predicted efficacy is R = 0.607. (DOCX) Click here for additional data file. Assessment of desiRm on experimentally verified mismatched siRNAs of siPrnp178(A9). siPrnp178(A9) and its various mutant siRNAs were targeted against prion protein genes (PRNP) and its mutant allele (PRNP-D178N). Mutated base in siRNA is denoted by small letter while mismatch base between siRNA and target are denoted by bold letter. Data of actual efficacy of siRNAs were taken from experimental work reported by Ohnishi et al. Predicted efficacy denotes efficacy of desiRm. All sequences are in 5′ to 3′ direction. Correlation coefficient between actual and predicted efficacy is R = 0.666. (DOCX) Click here for additional data file.
  39 in total

1.  siRNAs can function as miRNAs.

Authors:  John G Doench; Christian P Petersen; Phillip A Sharp
Journal:  Genes Dev       Date:  2003-02-15       Impact factor: 11.361

2.  Expression profiling reveals off-target gene regulation by RNAi.

Authors:  Aimee L Jackson; Steven R Bartz; Janell Schelter; Sumire V Kobayashi; Julja Burchard; Mao Mao; Bin Li; Guy Cavet; Peter S Linsley
Journal:  Nat Biotechnol       Date:  2003-05-18       Impact factor: 54.908

3.  Tolerance for mutations and chemical modifications in a siRNA.

Authors:  Mohammed Amarzguioui; Torgeir Holen; Eshrat Babaie; Hans Prydz
Journal:  Nucleic Acids Res       Date:  2003-01-15       Impact factor: 16.971

4.  A framework for multiple kernel support vector regression and its applications to siRNA efficacy prediction.

Authors:  Shibin Qiu; Terran Lane
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2009 Apr-Jun       Impact factor: 3.710

5.  Prediction of polyadenylation signals in human DNA sequences using nucleotide frequencies.

Authors:  Firoz Ahmed; Manish Kumar; Gajendra P S Raghava
Journal:  In Silico Biol       Date:  2009

6.  Efficient reduction of target RNAs by small interfering RNA and RNase H-dependent antisense agents. A comparative analysis.

Authors:  Timothy A Vickers; Seongjoon Koo; C Frank Bennett; Stanley T Crooke; Nicholas M Dean; Brenda F Baker
Journal:  J Biol Chem       Date:  2002-12-23       Impact factor: 5.157

7.  Sequence, chemical, and structural variation of small interfering RNAs and short hairpin RNAs and the effect on mammalian gene silencing.

Authors:  Jens Harborth; Sayda M Elbashir; Kim Vandenburgh; Heiko Manninga; Stephen A Scaringe; Klaus Weber; Thomas Tuschl
Journal:  Antisense Nucleic Acid Drug Dev       Date:  2003-04

8.  Profiling of mismatch discrimination in RNAi enabled rational design of allele-specific siRNAs.

Authors:  Huang Huang; Renping Qiao; Deyao Zhao; Tong Zhang; Youxian Li; Fan Yi; Fangfang Lai; Junmei Hong; Xianfeng Ding; Zhenjun Yang; Lihe Zhang; Quan Du; Zicai Liang
Journal:  Nucleic Acids Res       Date:  2009-12       Impact factor: 16.971

9.  Five siRNAs targeting three SNPs may provide therapy for three-quarters of Huntington's disease patients.

Authors:  Edith L Pfister; Lori Kennington; Juerg Straubhaar; Sujata Wagh; Wanzhou Liu; Marian DiFiglia; Bernhard Landwehrmeyer; Jean-Paul Vonsattel; Phillip D Zamore; Neil Aronin
Journal:  Curr Biol       Date:  2009-04-09       Impact factor: 10.834

10.  Prediction of guide strand of microRNAs from its sequence and secondary structure.

Authors:  Firoz Ahmed; Hifzur Rahman Ansari; Gajendra P S Raghava
Journal:  BMC Bioinformatics       Date:  2009-04-09       Impact factor: 3.169

View more
  19 in total

1.  SMEpred workbench: A web server for predicting efficacy of chemicallymodified siRNAs.

Authors:  Showkat Ahmad Dar; Amit Kumar Gupta; Anamika Thakur; Manoj Kumar
Journal:  RNA Biol       Date:  2016-09-07       Impact factor: 4.652

2.  pssRNAit: A Web Server for Designing Effective and Specific Plant siRNAs with Genome-Wide Off-Target Assessment.

Authors:  Firoz Ahmed; Muthappa Senthil-Kumar; Xinbin Dai; Vemanna S Ramu; Seonghee Lee; Kirankumar S Mysore; Patrick Xuechun Zhao
Journal:  Plant Physiol       Date:  2020-07-10       Impact factor: 8.340

3.  Inhibiting influenza virus replication and inducing protection against lethal influenza virus challenge through chitosan nanoparticles loaded by siRNA.

Authors:  Abbas Jamali; Fatemeh Mottaghitalab; Asghar Abdoli; Meshkat Dinarvand; Aida Esmailie; Masoumeh Tavassoti Kheiri; Fatemeh Atyabi
Journal:  Drug Deliv Transl Res       Date:  2018-02       Impact factor: 4.617

4.  Functional features defining the efficacy of cholesterol-conjugated, self-deliverable, chemically modified siRNAs.

Authors:  Taisia Shmushkovich; Kathryn R Monopoli; Diana Homsy; Dmitriy Leyfer; Monica Betancur-Boissel; Anastasia Khvorova; Alexey D Wolfson
Journal:  Nucleic Acids Res       Date:  2018-11-16       Impact factor: 16.971

5.  Mining Functional Elements in Messenger RNAs: Overview, Challenges, and Perspectives.

Authors:  Firoz Ahmed; Vagner A Benedito; Patrick Xuechun Zhao
Journal:  Front Plant Sci       Date:  2011-11-30       Impact factor: 5.753

6.  Emergent RNA-RNA interactions can promote stability in a facultative phototrophic endosymbiosis.

Authors:  Benjamin H Jenkins; Finlay Maguire; Guy Leonard; Joshua D Eaton; Steven West; Benjamin E Housden; David S Milner; Thomas A Richards
Journal:  Proc Natl Acad Sci U S A       Date:  2021-09-21       Impact factor: 12.779

7.  Protection against lethal Marburg virus infection mediated by lipid encapsulated small interfering RNA.

Authors:  Raul Ursic-Bedoya; Chad E Mire; Marjorie Robbins; Joan B Geisbert; Adam Judge; Ian MacLachlan; Thomas W Geisbert
Journal:  J Infect Dis       Date:  2013-08-29       Impact factor: 5.226

8.  Prediction of uridine modifications in tRNA sequences.

Authors:  Bharat Panwar; Gajendra P S Raghava
Journal:  BMC Bioinformatics       Date:  2014-10-02       Impact factor: 3.169

9.  PHDcleav: a SVM based method for predicting human Dicer cleavage sites using sequence and secondary structure of miRNA precursors.

Authors:  Firoz Ahmed; Rakesh Kaundal; Gajendra P S Raghava
Journal:  BMC Bioinformatics       Date:  2013-10-09       Impact factor: 3.169

10.  A web-based resource for designing therapeutics against Ebola Virus.

Authors:  Sandeep Kumar Dhanda; Kumardeep Chaudhary; Sudheer Gupta; Samir Kumar Brahmachari; Gajendra P S Raghava
Journal:  Sci Rep       Date:  2016-04-26       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.