| Literature DB >> 24263461 |
Xueqiu Jian1, Eric Boerwinkle2, Xiaoming Liu1.
Abstract
RNA splicing is the process during which introns are excised and exons are spliced. The precise recognition of splicing signals is critical to this process, and mutations affecting splicing comprise a considerable proportion of genetic disease etiology. Analysis of RNA samples from the patient is the most straightforward and reliable method to detect splicing defects. However, currently, the technical limitation prohibits its use in routine clinical practice. In silico tools that predict potential consequences of splicing mutations may be useful in daily diagnostic activities. In this review, we provide medical geneticists with some basic insights into some of the most popular in silico tools for splicing defect prediction, from the viewpoint of end users. Bioinformaticians in relevant areas who are working on huge data sets may also benefit from this review. Specifically, we focus on those tools whose primary goal is to predict the impact of mutations within the 5' and 3' splicing consensus regions: the algorithms used by different tools as well as their major advantages and disadvantages are briefly introduced; the formats of their input and output are summarized; and the interpretation, evaluation, and prospection are also discussed.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24263461 PMCID: PMC4029872 DOI: 10.1038/gim.2013.176
Source DB: PubMed Journal: Genet Med ISSN: 1098-3600 Impact factor: 8.822
Figure 1Schematic illustration of pre-mRNA splicing. 5′ ss and 3′ ss are recognized by the spliceosome and the intron is excised and exons are spliced. The whole process is regulated by trans-acting elements such as SR proteins, hnRNPs, and the regulatory complex.
A hypothetical example of a position weight matrix (PWM).*
| Nucleotide | Site
| |||||
|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | |
|
| ||||||
| 0.17 | 0.00 | 0.00 | 0.05 | |||
|
| ||||||
| 0.05 | 0.26 | 0.00 | 0.00 | |||
|
| ||||||
| 0.15 | 0.00 | 0.00 | 0.16 | 0.22 | ||
|
| ||||||
| 0.07 | 0.00 | 0.00 | 0.30 | 0.16 | ||
For each site the frequencies of different nucleotides observed in a set of aligned sequences are calculated to construct the PWM which is used to score and rank a sequence. For example, a sequence ACGTTA is most likely to be observed in the population and has the highest score, while a sequence TGACAT is one of the most unlikely and has the lowest score. The formula used to calculate the score varies between 5′ and 3′ splice sites and between different PWM algorithms.
Summary of input, output, and interpretation of prediction scores for selected currently available in silico tools for 5′ and 3′ splice site prediction with user-friendly web interface.
| Tool | Input | Output | Interpretation |
|---|---|---|---|
| Single/multiple sequences (5′: 9 bp (−3~+6); 3′: 15 bp (−14~+1)) | S & S score (0~100) | Higher score implies a more similar ss sequence with the consensus sequence | |
| Single sequence (200 bp < length < 80,000 bp) | Confidence score (0~1) | Higher score implies a higher confidence of true site | |
| Single/multiple sequences | Score (0~1) | Higher score implies a more potential splice site | |
| Single sequence ≤ 1 million bp | Probability score (0~1) | Higher score implies a higher probability of correct exon | |
| Single sequence ≤ 31000 bp | S & S score (0~100) | Higher score implies a more similar ss sequence with the consensus sequence | |
| Single/multiple 11 bp sequences (−3~+8) containing GT in +1/+2 or one genomic sequence | Hbond score | Higher score implies a stronger capability of forming H-bonds with U1 snRNA | |
| Single/multiple sequences (5′: 9 bp (−3~+6); 3′: 23bp (−20~+3)) | Maximum entropy score (log-odds ratio) | Higher score implies a higher probability the sequence being a true splice site | |
| Single/multiple sequences | *-value (3~15) determined by p, rho and gamma values | Higher value implies a more reliable of the predicted splice site | |
| Mutation to be analyzed and the reference sequence | Information contents Ri | Color-coded by direction and type of change in Ri | |
| Single/multiple sequences ≤ 30,000 bp | FGA score | Higher score implies a more precise prediction of splice site | |
| Single sequence ≤ 5,000 bp | S & S score (0~100) | Higher score implies a more potential splice site | |
| Single/multiple sequences ≤ 4,000 bp containing one exon in upper case and flanking intronic sequence ≥ 4 bp in lower case | Probability of cryptic ss activation (0~1) | Higher value implies a higher probability of cryptic ss activation as opposed to exon skipping | |
| Target exon along with two flanking introns | Different scores with their percentile scores (0~1) | Higher percentile score implies a higher ranking of the ss within pre-calculated distributions | |
| Single sequence containing the SNP(s) and the Ensembl gene ID to which the SNP(s) belong(s) | Classification of the probability for a change in splicing | Probable, likely, or unlikely | |
| Single/multiple sequences with one mutation and ≥ 5 bp in each side of the mutation | L1 distance and percentile rank | Higher percentile rank implies a higher likelihood the point mutation is to disrupt splicing |
Selected recent publications whose primary goal (or one of the goals) was to evaluate in silico tools for splicing defect prediction.
| Number of variants | Gene(s) | Prediction tools evaluated | YearReference |
|---|---|---|---|
| 39 | NNSplice, PWM, MaxEntScan, ASSA, ESEfinder, RESCUE-ESE | 2008[ | |
| 18 | MaxEntScan, NNSplice, NetGene2 | 2009[ | |
| 29 | NNSplice, NetGene2, PWM, ASSA, MaxEntScan, HSF | 2009[ | |
| 623 | Multiple | GENSCAN, GeneSplicer, HSF, MaxEntScan, NNSplice, SplicePort, SplicePredictor, SpliceView, SROOGLE | 2010[ |
| 53 | PWM, GeneSplicer, NNSplice, MaxEntScan, HSF | 2011[ | |
| 272 | NNSplice, PWM, MaxEntScan, ESEfinder, RESCUE- ESE, HSF | 2012[ | |
| 24 | PWM, MaxEntScan, NNSplice, GeneSplicer, HSF, NetGene2, SpliceView, SplicePredictor, ASSA | 2013[ |
ESEfinder and RESCUE-ESE are web tools that predict ESEs.