| Literature DB >> 32275707 |
Augusto Anguita-Ruiz1,2,3, Alberto Segura-Delgado4, Rafael Alcalá4, Concepción M Aguilera1,2,3, Jesús Alcalá-Fdez4.
Abstract
Until date, several machine learning approaches have been proposed for the dynamic modeling of temporal omics data. Although they have yielded impressive results in terms of model accuracy and predictive ability, most of these applications are based on "Black-box" algorithms and more interpretable models have been claimed by the research community. The recent eXplainable Artificial Intelligence (XAI) revolution offers a solution for this issue, were rule-based approaches are highly suitable for explanatory purposes. The further integration of the data mining process along with functional-annotation and pathway analyses is an additional way towards more explanatory and biologically soundness models. In this paper, we present a novel rule-based XAI strategy (including pre-processing, knowledge-extraction and functional validation) for finding biologically relevant sequential patterns from longitudinal human gene expression data (GED). To illustrate the performance of our pipeline, we work on in vivo temporal GED collected within the course of a long-term dietary intervention in 57 subjects with obesity (GSE77962). As validation populations, we employ three independent datasets following the same experimental design. As a result, we validate primarily extracted gene patterns and prove the goodness of our strategy for the mining of biologically relevant gene-gene temporal relations. Our whole pipeline has been gathered under open-source software and could be easily extended to other human temporal GED applications.Entities:
Year: 2020 PMID: 32275707 PMCID: PMC7176286 DOI: 10.1371/journal.pcbi.1007792
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Fig 1Assignable categories and ranking scores for a rule in the biological measures “BP”, “MF”, “CC” and “SP”.
First, a rule will be assigned to a particular category from the bottom category 5 to the top category 1. This assignation will be conducted according to the type of matches encountered for a rule in its annotated terms as is described in the figure. Once a rule is designated to a particular category, a score will be computed for the rule taking into consideration all type of matches encountered for the rule. Each match is weighted with a number of points as illustrated in the figure. The final score for a rule is computed as detailed in the method section.
Fig 2Role of biological quality measurements for the functional assessment of each discovered gene-gene relationship.
While all extracted rules present acceptable and identical quality metrics (support = 90% and confidence = 85%), only the rule 1 presents a good BP measure value (remember that the range of values available for the BP measure was from 5 to 1, being the values near to 1 the ones corresponding to a higher number of GO matches between LHS and RHS genes). On the other hand, it is only the rule 4 the one presenting a good value for the TF measure (whose range of values was from 0 to 3, being 3 the maximal score for indicating a true TF-target gene relationship). The figure illustrate how the functional validation of results is critical to discern between spurious associations and true phenomena of interaction.
Datasets details and problem description.
| GEO Identifier | Design | Intervention Details | Time records available | Nº subjects | BMI at the beginning of the study | Age | Sample tissue | Array Platform | Nº mined Strong ARs | Network representation |
|---|---|---|---|---|---|---|---|---|---|---|
| Dietary Intervention | 1250 kcal/d during 12 weeks and a weight stable period of 4 weeks | 3 | 22 (12/10) | 28–35 kg/m2 | 51.8 ± 1.9 | Abdominal Subcutaneous Adipose Tissue | 40 | |||
| Dietary Intervention | 500 kcal/d during 5 weeks and a weight stable period of 4 weeks | 3 | 24 (13/11) | 28–35 kg/m2 | 50.7 ± 1.5 | Abdominal Subcutaneous Adipose Tissue | 301 | |||
| Dietary Intervention | Low-calorie diet of self-prepared foods for consecutive 5,10 and 15% weightloss | 4 | 9 (8/1) | 37.9 ± 4.3 kg/m2 | 44 ± 12 | Abdominal Subcutaneous Adipose Tissue | 551 | |||
| Dietary Intervention | 1200 kcal/d during 3 months and a weight stable period of 4 weeks | 3 | 9 (6/3) | 42.7 ± 1.4 kg/m2 | 40 ± 3.73 | Abdominal Subcutaneous Adipose Tissue | 83 | |||
| Dietary Intervention + exercise counselling | 800–1000 kcal/d during 6 weeks and a less restrictive diet plan + exercise counselling for 12 months | 3 | 6 (3/3) | 34.64 (0.7) kg/m2 | 21–48 | Abdominal Subcutaneous Adipose Tissue | 870 | |||
| Dietary Intervention + exercise counselling | 800–1000 kcal/d during 6 weeks and a less restrictive diet plan + exercise counselling for 12 months | 3 | 13 (9/4) | 34.65 (0.85) kg/m2 | 20–45 | Abdominal Subcutaneous Adipose Tissue | 70 |
BMI and age data are presented as mean ± SEM, mean (SE) or rather as a range.
* Datasets employed as discovery population.
Fig 3Visual representation of the sequential rules discovered by our method in the GSE77962 dataset (LCD group).
Node names refer to (probe/gene).
Fig 4Visual representation of the sequential rules discovered by our method in the GSE77962 dataset (VLCD group).
Node names refer to (probe/gene).
Descriptive statistics on quality metrics for strong association rules discovered in the whole GSE77962 dataset (LCD and VLCD groups).
| 301 | 301 | 301 | 301 | 301 | 301 | 301 | 301 | 301 | 301 | ||
| 11.00 | 0.71 | 1.13 | 0.22 | 1.27 | 1.001 | 1.001 | 1.001 | 1.2 | 0 | ||
| 11.08 | 0.88 | 1.55 | 0.71 | 2.19 | 1.94 | 2.17 | 3.8 | 0.21 | |||
| 0.27 | 0.09 | 0.23 | 0.22 | - | 1.1 | 0.48 | 1.06 | 2.4 | 0.41 | ||
| 11.00 | 0.85 | 1.56 | 0.69 | 3.25 | 1.91 | 1.9 | 1.91 | 6 | 0 | ||
| 12.00 | 1.00 | 2.00 | 1 | 6 | 6 | 6 | 6 | 1 | |||
| 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | 40 | ||
| 9.00 | 0.69 | 1.08 | 0.27 | 1.36 | 1.002 | 1.002 | 1.002 | 1.2 | 0 | ||
| 9.72 | 0.81 | 1.38 | 0.54 | 2.39 | 1.77 | 2.71 | 4.56 | 0.1 | |||
| 1.26 | 0.09 | 0.17 | 0.18 | - | 1.55 | 0.16 | 1.8 | 2.23 | 0.3 | ||
| 9.00 | 0.82 | 1.38 | 0.55 | 2.21 | 1.84 | 1.84 | 1.84 | 6 | 0 | ||
| 15 | 1 | 1.69 | 1 | 6 | 1.96 | 6 | 6 | 1 | |||
Fig 5Correlation between traditional quality metrics and biological quality measures by rule in the sequential rules discovered from the whole GSE77962 dataset (LCD and VLCD groups).
R2 values quantify the level of correlation for each pair of measures while the level of statistical significance (adjusted by Bonferroni multiple test correction) is evidenced with an X for P-values > 0.05 and nothing for P-values < 0.05.
Subset of biologically meaningful extracted sequential rules in the whole GSE77962 dataset (LCD and VLCD groups).
| Intervention Group | LHS | RHS | SUP | CONF | LIFT | CF | CONV | BP | MF | CC | SP | TF |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| VLCD | {8140556/HGF = 2} | {8146000/ADAM9 = 1} | 11.00 | 0.77 | 1.35 | 0.49 | 1.94 | 1.002 | 1.002 | 1.002 | 6.00 | 0.00 |
| VLCD | {7897068/SKI = 1} | {8146000/ADAM9 = 1} | 12.00 | 0.92 | 1.58 | 0.81 | 5.42 | 1.39 | 1.39 | 1.39 | 1.20 | 1.00 |
| VLCD | {8034940/NOTCH3 = 1} | {7981142/CLMN = 2} | 11.00 | 0.85 | 1.45 | 0.63 | 2.71 | 1.79 | 1.79 | 1.79 | 6.00 | 1.00 |
| LCD | {8034940/NOTCH3 = 1} | {8166079/EGFL6 = 1} | 12.00 | 0.86 | 1.11 | 0.37 | 1.59 | 1.79 | 1.79 | 1.79 | 6.00 | 1.00 |
| LCD | {7928872/SNCG = 1 & 8034940/NOTCH3 = 1} | {7932227/NMT2 = 1} | 9.00 | 0.90 | 1.52 | 0.76 | 4.09 | 1.83 | 1.83 | 1.83 | 6.00 | 1.00 |
| VLCD | {8131326/SLC29A4 = 1} | {8101992/SLC39A8 = 1} | 11.00 | 0.79 | 1.26 | 0.43 | 1.75 | 1.88 | 1.88 | 1.88 | 1.20 | 0.00 |
| VLCD | {8129045/HDAC2 = 2} | {8101992/SLC39A8 = 1} | 12.00 | 0.92 | 1.48 | 0.79 | 4.87 | 1.93 | 1.93 | 1.93 | 6.00 | 1.00 |
| VLCD | {7929201/BTAF1 = 2} | {8101992/SLC39A8 = 1} | 11.00 | 0.79 | 1.26 | 0.43 | 1.75 | 1.94 | 1.94 | 1.94 | 1.20 | 1.00 |
| VLCD | {7929201/BTAF1 = 2 & 7940153/FAM111A = 2} | {8101992/SLC39A8 = 1} | 11.00 | 0.92 | 1.47 | 0.78 | 4.50 | 1.94 | 1.94 | 1.94 | 1.20 | 1.00 |
| VLCD | {7929201/BTAF1 = 2 & 8106141/FCHO2 = 2} | {8101992/SLC39A8 = 1} | 11.00 | 0.85 | 1.35 | 0.59 | 2.44 | 1.95 | 1.95 | 1.95 | 1.20 | 1.00 |
| LCD | {8087224/SLC25A20 = 1& 8034940/NOTCH3 = 1} | {8166079/EGFL6 = 1} | 9.00 | 1.00 | 1.29 | 1.00 | Inf | 1.96 | 1.96 | 1.96 | 6.00 | 1.00 |
| LCD | {7980970/ITPK1 = 1 & 8034940/NOTCH3 = 1} | {8032829/PLIN4 = 2} | 9.00 | 0.75 | 1.50 | 0.50 | 2.00 | 6.00 | 6.00 | 1.90 | 6.00 | 1.00 |