| Literature DB >> 22323516 |
Abstract
The activation of cryptic 5' splice sites (5' SSs) is often related to human hereditary diseases. The DNA-based mutation screening strategies are commonly used to recognize the cryptic 5' SSs, because features of the local DNA sequence can influence the choice of cryptic 5' SSs. To improve the identification of the cryptic 5' SSs, we developed a structure-based method, named SPO (structure profiles and odds measure), which combines two parameters, the structural feature derived from hydroxyl radical cleavage pattern and odds measure, to assess the likelihood of a cryptic 5' SS activation in competing with its paired authentic 5' SS. Compared to the current tools for identifying activated cryptic 5' SSs, the SPO algorithm achieves higher prediction accuracy than the other methods, including MaxEnt, MDD, Markov model, weight matrix model, Shapiro and Senapathy matrix, R(i) and ΔG. In addition, the predicted ΔSPO scores from the SPO algorithm exhibited a greater degree of correlation with the strength of cryptic 5' SS activation than that measured from the other seven methods. In conclusion, the SPO algorithm provides an optimal identification of cryptic 5' SSs, can be applied in designing mutagenesis experiments for various splicing events and may be helpful to investigate the relationship between structural variants and human hereditary diseases.Entities:
Mesh:
Substances:
Year: 2012 PMID: 22323516 PMCID: PMC3378896 DOI: 10.1093/nar/gks061
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Flow chart of SPO algorithm.
Figure 2.Sensitivity, specificity, precision, false positive rate, accuracy and F-measure vary with ΔSPO score. (A) Sensitivity and precision vary with ΔSPO score; (B) specificity and false positive rate vary with ΔSPO score; (C) accuracy and F-measure vary with ΔSPO score.
Performance of scoring methods in identifying activated cryptic 5′ SSs based on 490 paired splicing sequences included in the HMD1 data set
| Method | Performance measures | |||||
|---|---|---|---|---|---|---|
| Sensitivity | Specificity | Accuracy | Precision | AUC | ||
| SPO | 0.823 | 0.884 | 0.857 | 0.851 | 0.849 | 0.905 |
| MaxEnt | 0.730 | 0.840 | 0.792 | 0.780 | 0.781 | 0.849 |
| MDD | 0.712 | 0.836 | 0.782 | 0.768 | 0.774 | 0.844 |
| MM | 0.744 | 0.818 | 0.786 | 0.778 | 0.762 | 0.828 |
| WMM | 0.665 | 0.720 | 0.696 | 0.691 | 0.650 | 0.734 |
| S&S | 0.740 | 0.695 | 0.714 | 0.714 | 0.655 | 0.782 |
| 0.730 | 0.647 | 0.706 | 0.707 | 0.687 | 0.772 | |
| Δ | 0.679 | 0.609 | 0.667 | 0.667 | 0.658 | 0.730 |
Figure 3.Comparison of predictive accuracy of the scoring methods for identifying activated cryptic 5′ SSs. (A) Sensitivity versus 1 − specificity for the scoring methods; (B) false positive rate versus false negative rate for the scoring methods.
Accuracy of scoring methods in different mutant categories
| Mutant type | SPO | MaxEnt | MDD | MM | WMM | S&S | Δ | |
|---|---|---|---|---|---|---|---|---|
| Point mutation | 0.822 | 0.723 | 0.708 | 0.738 | 0.535 | 0.629 | 0.728 | 0.678 |
| Deletion | 0.889 | 0.889 | 0.778 | 0.889 | 0.778 | 0.778 | 0.778 | 0.667 |
| Duplication | 1.000 | 1.000 | 1.000 | 1.000 | 0.000 | 0.000 | 1.000 | 1.000 |
| Insertion | 0.667 | 0.667 | 0.667 | 0.667 | 0.667 | 0.667 | 0.667 | 0.667 |
| Total | 0.772 | 0.679 | 0.665 | 0.693 | 0.502 | 0.591 | 0.684 | 0.637 |
Accuracy of scoring methods in competition assays based on 52 (12 strong, 26 intermediate and 14 weak) paired splicing sequences in the HMD2 data set
| Data type | SPO | MaxEnt | MDD | MM | WMM | S&S | Δ | |
|---|---|---|---|---|---|---|---|---|
| Strong | 1.000 | 0.833 | 1.000 | 0.917 | 0.917 | 0.917 | 1.000 | 0.917 |
| Intermediate | 0.769 | 0.654 | 0.692 | 0.692 | 0.615 | 0.731 | 0.654 | 0.654 |
| Weak | 1.000 | 1.000 | 0.929 | 1.000 | 0.929 | 1.000 | 0.714 | 0.929 |
| Strong and intermediate | 0.842 | 0.711 | 0.789 | 0.763 | 0.711 | 0.789 | 0.763 | 0.737 |
| Intermediate and weak | 0.850 | 0.775 | 0.775 | 0.800 | 0.725 | 0.825 | 0.675 | 0.750 |
| Strong and weak | 1.000 | 0.923 | 0.962 | 0.962 | 0.923 | 0.962 | 0.846 | 0.923 |
| Total | 0.885 | 0.788 | 0.827 | 0.827 | 0.769 | 0.846 | 0.750 | 0.788 |
Pearson's correlation coefficients of the competition assays of 5′ SSs and their scores in the HMD2 data set
| Data type | ΔSPO | ΔMaxEnt | ΔMDD | ΔMM | ΔWMM | ΔS&S | Δ | ΔΔ |
|---|---|---|---|---|---|---|---|---|
| CS-I | 0.812 | 0.605 | 0.558 | 0.713 | 0.559 | 0.666 | 0.572 | 0.551 |
| CS-II | 0.837 | 0.881 | 0.852 | 0.873 | 0.754 | 0.885 | 0.764 | 0.718 |
| Total | 0.859 | 0.785 | 0.747 | 0.789 | 0.661 | 0.802 | 0.701 | 0.681 |