| Literature DB >> 28651634 |
Aleksandra Gruca1, Marek Sikora2.
Abstract
BACKGROUND: High-throughput methods in molecular biology provided researchers with abundance of experimental data that need to be interpreted in order to understand the experimental results. Manual methods of functional gene/protein group interpretation are expensive and time-consuming; therefore, there is a need to develop new efficient data mining methods and bioinformatics tools that could support the expert in the process of functional analysis of experimental results.Entities:
Keywords: Expert-driven rule induction; Functional description; Gene Ontology; Logical rules
Mesh:
Year: 2017 PMID: 28651634 PMCID: PMC5483958 DOI: 10.1186/s13326-017-0129-x
Source DB: PubMed Journal: J Biomed Semantics
Fig. 1The workflow of the induction of logical rules for functional description
Comparison of different logical rule generation methods and different parameter settings
| S01 | S02 | S03 | S04(1) | S04(2) | S04(3) | S04(4) | S04(5) | S04(6) | |
|---|---|---|---|---|---|---|---|---|---|
| No. of rules before filtering | 3812 | 3812 | 3812 | 110 | 110 | 110 | 110 | 110 | 110 |
| No. of output rules | 3812 | 32 | 32 | 9 | 10 | 7 | 19 | 110 | 14 |
| No. of rules with expert terms | 1465 | 15 | 11 | 9 | 10 | 7 | 19 | 110 | 14 |
| Coverage |
|
|
| 64 | 64 | 64 | 64 | 64 | 64 |
| Avg. p-value | 0.018 | 0.017 | 0.014 |
| 0.012 | 0.013 | 0.019 | 0.016 | 0.014 |
| Avg. precision | 0.74 | 0.78 | 0.77 |
| 0.78 | 0.7 | 0.68 | 0.71 | 0.72 |
| Avg. coverage | 0.14 | 0.15 | 0.15 | 0.16 | 0.16 | 0.16 | 0.16 | 0.15 |
|
| Avg. GO Level | 4.06 | 4.18 | 3.7 | 4.95 | 4.84 |
| 4.66 | 4.51 | 4.7 |
| Positive coverage |
|
|
| 14 | 14 | 14 | 14 | 14 | 14 |
| Negative coverage | 57 | 35 | 36 |
| 12 | 13 | 19 | 20 | 14 |
| Positive coverage - expert rules |
| 13 | 11 |
|
|
|
|
|
|
| Negative coverage - expert rules | 28 | 10 |
|
| 12 | 13 | 19 | 20 | 14 |
| Avg. no. of descriptors |
| 3.19 | 3.53 | 2.33 | 2.5 | 1.43 | 2.47 | 2.66 | 2.36 |
| Avg. no. of expert term per rule | 0.41 | 0.47 | 0.38 |
| 1.4 | 1.14 | 1.53 | 1.35 | 1.29 |
| Number of distinctive expert terms |
| 8 | 6 | 9 | 9 | 8 | 13 |
| 11 |
S01 – RuleGO method without filtering procedure, S02 – standard RuleGO method with applied filtering, S03 – filtering using UTA approach, S04 – new rule generation approach using seed terms. Description of different Q Compound measure and filtering setting for S04(1)-S04(6) is presented in Table 2
List of parameters used for different rule induction methods as presented in Table 1
| Dataset | S01 | S02 | S03 | S04(1) | S04(2) | S04(3) | S04(4) | S04(5) | S04(6) | |
|---|---|---|---|---|---|---|---|---|---|---|
| mYAILS | YES | YES | YES | YES |
|
|
|
|
| |
| QCompound | Length | YES | YES | YES | YES | YES |
|
|
| YES |
| GO_Depth | YES | YES | YES | YES | YES | YES | YES | YES | YES | |
| Filtering | 1st level | YES | YES | YES | YES | YES | YES | YES |
| YES |
| 2st level | YES | YES | YES | YES | YES | YES |
|
|
|
YES means that the particular component of the Q Compound measure or particular step of filtering procedure was applied during the rule generation, NO means that the component was removed from the formula, in case of Q Compound measure, or, in case of filtering, was not applied. Columns represent different approaches to rule induction process and are consistent with the description of columns in Table 1
Fig. 2Visualization of rule networks obtained for the selected rule induction methods. Each circle represents rule network obtained by using different methods. a – rule network set S01, b – rule network set S02, c – rule network set S04(1), d – rule network set S04(5)