Klev Diamanti1,2, Karolina Smolińska1, Mateusz Garbulowski1, Nicholas Baltzer1,3, Patricia Stoll1,4, Susanne Bornelöv1,5, Aleksander Øhrn6, Lars Feuk2, Jan Komorowski7,8,9,10. 1. Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden. 2. Department of Immunology, Genetics and Pathology, Uppsala University, Uppsala, Sweden. 3. Department of Research, Cancer Registry of Norway, Oslo, Norway. 4. Department of Biosystems Science and Engineering, ETH Zurich, Zurich, Switzerland. 5. Cancer Research UK Cambridge Institute, University of Cambridge, Cambridge, UK. 6. Department of Informatics, University of Oslo, Oslo, Norway. 7. Department of Cell and Molecular Biology, Uppsala University, Uppsala, Sweden. jan.komorowski@icm.uu.se. 8. Swedish Collegium for Advanced Study, Uppsala, Sweden. jan.komorowski@icm.uu.se. 9. Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland. jan.komorowski@icm.uu.se. 10. Washington National Primate Research Center, Seattle, WA, USA. jan.komorowski@icm.uu.se.
Abstract
BACKGROUND: Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components. RESULTS: We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA . To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case-control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes. CONCLUSIONS: R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables.
BACKGROUND: Machine learning involves strategies and algorithms that may assist bioinformatics analyses in terms of data mining and knowledge discovery. In several applications, viz. in Life Sciences, it is often more important to understand how a prediction was obtained rather than knowing what prediction was made. To this end so-called interpretable machine learning has been recently advocated. In this study, we implemented an interpretable machine learning package based on the rough set theory. An important aim of our work was provision of statistical properties of the models and their components. RESULTS: We present the R.ROSETTA package, which is an R wrapper of ROSETTA framework. The original ROSETTA functions have been improved and adapted to the R programming environment. The package allows for building and analyzing non-linear interpretable machine learning models. R.ROSETTA gathers combinatorial statistics via rule-based modelling for accessible and transparent results, well-suited for adoption within the greater scientific community. The package also provides statistics and visualization tools that facilitate minimization of analysis bias and noise. The R.ROSETTA package is freely available at https://github.com/komorowskilab/R.ROSETTA . To illustrate the usage of the package, we applied it to a transcriptome dataset from an autism case-control study. Our tool provided hypotheses for potential co-predictive mechanisms among features that discerned phenotype classes. These co-predictors represented neurodevelopmental and autism-related genes. CONCLUSIONS: R.ROSETTA provides new insights for interpretable machine learning analyses and knowledge-based systems. We demonstrated that our package facilitated detection of dependencies for autism-related genes. Although the sample application of R.ROSETTA illustrates transcriptome data analysis, the package can be used to analyze any data organized in decision tables.
Entities:
Keywords:
Big data; Interpretable machine learning; R package; Rough sets; Rule-based classification; Transcriptomics
Authors: Eleazar Gil-Herrera; Ali Yalcin; Athanasios Tsalatsanis; Laura E Barnes; Benjamin Djulbegovic Journal: Conf Proc IEEE Eng Med Biol Soc Date: 2011
Authors: Amanda M Enstrom; Lisa Lit; Charity E Onore; Jeff P Gregg; Robin L Hansen; Isaac N Pessah; Irva Hertz-Picciotto; Judy A Van de Water; Frank R Sharp; Paul Ashwood Journal: Brain Behav Immun Date: 2008-08-14 Impact factor: 7.217
Authors: Rickinder Sethi; Nieves Gómez-Coronado; Adam J Walker; Oliver D'Arcy Robertson; Bruno Agustini; Michael Berk; Seetal Dodd Journal: Front Psychiatry Date: 2019-09-04 Impact factor: 4.157
Authors: Jennifer R S Meadows; Jan Komorowski; Sara A Yones; Alva Annett; Patricia Stoll; Klev Diamanti; Linda Holmfeldt; Carl Fredrik Barrenäs Journal: Sci Rep Date: 2022-05-06 Impact factor: 4.996