| Literature DB >> 35523803 |
Jennifer R S Meadows1, Jan Komorowski2,3,4,5, Sara A Yones6, Alva Annett7, Patricia Stoll8, Klev Diamanti9, Linda Holmfeldt9, Carl Fredrik Barrenäs7.
Abstract
Transcriptomic analyses are commonly used to identify differentially expressed genes between patients and controls, or within individuals across disease courses. These methods, whilst effective, cannot encompass the combinatorial effects of genes driving disease. We applied rule-based machine learning (RBML) models and rule networks (RN) to an existing paediatric Systemic Lupus Erythematosus (SLE) blood expression dataset, with the goal of developing gene networks to separate low and high disease activity (DA1 and DA3). The resultant model had an 81% accuracy to distinguish between DA1 and DA3, with unsupervised hierarchical clustering revealing additional subgroups indicative of the immune axis involved or state of disease flare. These subgroups correlated with clinical variables, suggesting that the gene sets identified may further the understanding of gene networks that act in concert to drive disease progression. This included roles for genes (i) induced by interferons (IFI35 and OTOF), (ii) key to SLE cell types (KLRB1 encoding CD161), or (iii) with roles in autophagy and NF-κB pathway responses (CKAP4). As demonstrated here, RBML approaches have the potential to reveal novel gene patterns from within a heterogeneous disease, facilitating patient clinical and therapeutic stratification.Entities:
Mesh:
Year: 2022 PMID: 35523803 PMCID: PMC9076598 DOI: 10.1038/s41598-022-10853-1
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Figure 1Overview of the modelling process implemented to classify and interrogate gene expression relationships between DA1 and DA3.
Figure 2The rule networks discern the disease states. DA1 is largely defined by medium gene expression, whereas DA3 includes more genes, and those that were highly expressed. For each decision class, internal node colour indicates discretised gene expression value (high, medium, low; orange, grey, blue), node size is proportional to the number of objects supporting rules associated to a node, node border thickness is proportional to the number of rules associated to a node (low, high; circle border thin, thick) and edges connecting nodes represent normalised connection values (< 55%, ≥ 85%; grey, red with increasing line thickness per support interval). The latter is the strength of the co-appearance of connected nodes in rules supporting a decision class. The network was filtered to visualise rules with minimum support of 10% and rule p-value ≤ 0.05.
Figure 3Hierarchical clustering of the model rules showed the major subdivision between the DA clusters. (a) Supported rules (black) and unsupported rules (grey) distinguish five disease subgroups that were projected into the (b) RN where group (cluster) membership is indicated by pie colour.
Figure 4Fraction of rules per cluster significantly associated with (a) continuous and (b) categorical phenotypes. See Supplementary Table S3 online, for a list of clinical variables and phenotypes abbreviations.