| Literature DB >> 22897894 |
Raul Cruz-Cano1, Mei-Ling Ting Lee, Ming-Ying Leung.
Abstract
BACKGROUND: Logic minimization is the application of algebraic axioms to a binary dataset with the purpose of reducing the number of digital variables and/or rules needed to express it. Although logic minimization techniques have been applied to bioinformatics datasets before, they have not been used in classification and rule discovery problems. In this paper, we propose a method based on logic minimization to extract predictive rules for two bioinformatics problems involving the identification of functional sites in molecular sequences: transcription factor binding sites (TFBS) in DNA and O-glycosylation sites in proteins. TFBS are important in various developmental processes and glycosylation is a posttranslational modification critical to protein functions.Entities:
Year: 2012 PMID: 22897894 PMCID: PMC3492099 DOI: 10.1186/1756-0381-5-10
Source DB: PubMed Journal: BioData Min ISSN: 1756-0381 Impact factor: 2.522
Binary Function before Logic Minimization
| 1 | 0 | 0 | 0 | 0 | 1 |
| 2 | 0 | 0 | 0 | 1 | 0 |
| 3 | 0 | 0 | 1 | 0 | 1 |
| 4 | 0 | 0 | 1 | 1 | 1 |
| 5 | 0 | 1 | 0 | 0 | 0 |
| 6 | 0 | 1 | 0 | 1 | 0 |
| 7 | 0 | 1 | 1 | 0 | - |
| 8 | 0 | 1 | 1 | 1 | - |
| 9 | 1 | 0 | 0 | 0 | 1 |
| 10 | 1 | 0 | 0 | 1 | 0 |
| 11 | 1 | 0 | 1 | 0 | 1 |
| 12 | 1 | 0 | 1 | 1 | 1 |
| 13 | 1 | 1 | 0 | 0 | 0 |
| 14 | 1 | 1 | 0 | 1 | 0 |
| 15 | 1 | 1 | 1 | 0 | - |
| 16 | 1 | 1 | 1 | 1 | - |
Binary Function after Logic Minimization
| 1 | - | 0 | - | 0 | 1 |
| 2 | - | - | 1 | - | 1 |
Binary Function represented in a table before minimization
| 1 | 1 | 1 | 1 | 1 |
| 2 | 0 | 1 | 1 | 1 |
Binary Function represented in a table after minimization
| 1 | - | 1 | 1 | 1 |
Figure 1Logic Minimization-Based Algorithm.
Input/Output Patterns
| 1 | .8 | .9 | 1 |
| 2 | .7 | .8 | 0 |
Input/Output Patterns
| 1 | 1 | 1 | 1 |
| 2 | 1 | 1 | 0 |
Variables deem necessary by the Espresso to predict TFBS
| 1 | 26 | Nucleotide 7 is/is not C |
| 2 | 33 | Nucleotide 9 is/is not A |
| 3 | 29 | Nucleotide 8 is/is not A |
| 4 | 40 | Nucleotide 10 is/is not T |
| 5 | 23 | Nucleotide 6 is/is not G |
| 6 | 39 | Nucleotide 10 is/is not G |
| 7 | 25 | Nucleotide 7 is/is not A |
Rules for Prediction of TFBS after Logic Minimization
| 1 | - | 1 | 1 | 1 | 1 | - | - | 1 |
| 2 | 1 | 1 | 1 | 1 | - | 0 | 0 | 1 |
Comparison of Performance Results for Prediction of TFBS using the Top 7 Binary Variables
| Digital Logic Rules | 89.30 | 2.46 |
| Top Digital Logic | 82.14 | 5.78 |
| SVM | 87.50 | 2.80 |
| ANN | 55.36 | 1.77 |
Comparison of Performance Results for Prediction of O-glycosylated Sites using the Top 20 Binary Variables
| Digital Logic Rules | 91.23 | 66.60 | 89.40 | 66.82 | 90.99 | 66.11 |
| Top 45% Digital Logic | 84.76 | 80.61 | 71.70 | 72.97 | 83.06 | 79.62 |
| SVM | 74.47 | 95.48 | 70.26 | 69.52 | 73.92 | 92.11 |
| ANN | 74.04 | 95.05 | 65.65 | 57.95 | 72.95 | 90.11 |
Top 20 Variables deemed necessary by the SVM-RFE to predict S O-glycosylated Sites
| 1 | 183 | Amino Acid in Position 3 is/is not P |
| 2 | 101 | Amino Acid in Position −1 is/is not T |
| 3 | 143 | Amino Acid in Position 1 is/is not T |
| 4 | 204 | Amino Acid in Position 4 is/is not P |
| 5 | 99 | Amino Acid in Position −1 is/is not P |
| 6 | 185 | Amino Acid in Position 3 is/is not T |
| 7 | 206 | Amino Acid in Position 4 is/is not T |
| 8 | 163 | Amino Acid in Position 2 is/is not S |
| 9 | 88 | Amino Acid in Position −1 is/is not D |
| 10 | 36 | Amino Acid in Position −4 is/is not P |
| 11 | 180 | Amino Acid in Position 3 is/is not K |
| 12 | 100 | Amino Acid in Position −1 is/is not S |
| 13 | 2 | Amino Acid in Position −5 is/is not R |
| 14 | 162 | Amino Acid in Position 2 is/is not P |
| 15 | 227 | Amino Acid in Position 5 is/is not T |
| 16 | 78 | Amino Acid in Position −2 is/is not P |
| 17 | 142 | Amino Acid in Position 1 is/is not S |
| 18 | 164 | Amino Acid in Position 2 is/is not T |
| 19 | 74 | Amino Acid in Position −2 is/is not L |
| 20 | 148 | Amino Acid in Position 2 is/is not A |
Top 20 Variables deemed necessary by the SVM-RFE to predict T O-glycosylated Sites
| 1 | 15 | Amino Acid in Position −5 is/is not P |
| 2 | 206 | Amino Acid in Position 4 is/is not T |
| 3 | 164 | Amino Acid in Position 2 is/is not T |
| 4 | 38 | Amino Acid in Position −4 is/is not T |
| 5 | 227 | Amino Acid in Position 5 is/is not T |
| 6 | 183 | Amino Acid in Position 3 is/is not P |
| 7 | 143 | Amino Acid in Position 1 is/is not T |
| 8 | 185 | Amino Acid in Position 3 is/is not T |
| 9 | 101 | Amino Acid in Position −1 is/is not T |
| 10 | 99 | Amino Acid in Position −1 is/is not P |
| 11 | 80 | Amino Acid in Position −2 is/is not T |
| 12 | 17 | Amino Acid in Position 5 is/is not T |
| 13 | 225 | Amino Acid in Position −4 is/is not A |
| 14 | 22 | Amino Acid in Position −1 is/is not A |
| 15 | 85 | Amino Acid in Position 2 is/is not P |
| 16 | 162 | Amino Acid in Position −3 is/is not P |
| 17 | 57 | Amino Acid in Position 3 is/is not P |
| 18 | 142 | Amino Acid in Position 1 is/is not S |
| 19 | 59 | Amino Acid in Position −3 is/is not T |
| 20 | 169 | Amino Acid in Position 3 is/is not A |
Most popular rule for S O-glycosylated sites
| −1 | T | >1 |
| 1 | not S | <1 |
| 2 | A | >1 |
| 4 | T | =1 |
| 5 | not T | =1 |
| then we have found an O-Glycosylated S |
Most popular rule for T O-glycosylated sites
| −4 | T | =1 |
| −2 | T | >1 |
| 1 | Not T | <1 |
| 2 | T | >1 |
| 3 | Not T | <1 |
| 4 | T | >1 |
| 5 | Not T | <1 |
| then we have found an O-Glycosylated T |