Literature DB >> 33614290

modelBuildR: an R package for model building and feature selection with erroneous classifications.

Maximilian Knoll1,2,3, Jennifer Furkel1,2,3, Juergen Debus1,2,3, Amir Abdollahi1,2,3.   

Abstract

BACKGROUND: Model building is a crucial part of omics based biomedical research to transfer classifications and obtain insights into underlying mechanisms. Feature selection is often based on minimizing error between model predictions and given classification (maximizing accuracy). Human ratings/classifications, however, might be error prone, with discordance rates between experts of 5-15%. We therefore evaluate if a feature pre-filtering step might improve identification of features associated with true underlying groups.
METHODS: Data was simulated for up to 100 samples and up to 10,000 features, 10% of which were associated with the ground truth comprising 2-10 normally distributed populations. Binary and semi-quantitative ratings with varying error probabilities were used as classification. For feature preselection standard cross-validation (V2) was compared to a novel heuristic (V1) applying univariate testing, multiplicity adjustment and cross-validation on switched dependent (classification) and independent (features) variables. Preselected features were used to train logistic regression/linear models (backward selection, AIC). Predictions were compared against the ground truth (ROC, multiclass-ROC). As use case, multiple feature selection/classification methods were benchmarked against the novel heuristic to identify prognostically different G-CIMP negative glioblastoma tumors from the TCGA-GBM 450 k methylation array data cohort, starting from a fuzzy umap based rough and erroneous separation.
RESULTS: V1 yielded higher median AUC ranks for two true groups (ground truth), with smaller differences for true graduated differences (3-10 groups). Lower fractions of models were successfully fit with V1. Median AUCs for binary classification and two true groups were 0.91 (range: 0.54-1.00) for V1 (Benjamini-Hochberg) and 0.70 (0.28-1.00) for V2, 13% (n = 616) of V2 models showed AUCs < = 50% for 25 samples and 100 features. For larger numbers of features and samples, median AUCs were 0.75 (range 0.59-1.00) for V1 and 0.54 (range 0.32-0.75) for V2. In the TCGA-GBM data, modelBuildR allowed best prognostic separation of patients with highest median overall survival difference (7.51 months) followed a difference of 6.04 months for a random forest based method.
CONCLUSIONS: The proposed heuristic is beneficial for the retrieval of features associated with two true groups classified with errors. We provide the R package modelBuildR to simplify (comparative) evaluation/application of the proposed heuristic (http://github.com/mknoll/modelBuildR). ©2021 Knoll et al.

Entities:  

Keywords:  Feature selection; G-CIMP negative GBM; Glioblastoma multiforme; Ground truth; High dimensional data; Illumina humanmethylation array data; Long term/short term survivor; Misclassification; Model building; Prognosis

Year:  2021        PMID: 33614290      PMCID: PMC7879945          DOI: 10.7717/peerj.10849

Source DB:  PubMed          Journal:  PeerJ        ISSN: 2167-8359            Impact factor:   2.984


  12 in total

1.  A survey on filter techniques for feature selection in gene expression microarray analysis.

Authors:  Cosmin Lazar; Jonatan Taminau; Stijn Meganck; David Steenhoff; Alain Coletta; Colin Molter; Virginie de Schaetzen; Robin Duque; Hugues Bersini; Ann Nowé
Journal:  IEEE/ACM Trans Comput Biol Bioinform       Date:  2012 Jul-Aug       Impact factor: 3.710

2.  Adversarial Examples: Attacks and Defenses for Deep Learning.

Authors:  Xiaoyong Yuan; Pan He; Qile Zhu; Xiaolin Li
Journal:  IEEE Trans Neural Netw Learn Syst       Date:  2019-01-14       Impact factor: 10.451

3.  The PD-1/PD-L1 axis and human papilloma virus in patients with head and neck cancer after adjuvant chemoradiotherapy: A multicentre study of the German Cancer Consortium Radiation Oncology Group (DKTK-ROG).

Authors:  Panagiotis Balermpas; Franz Rödel; Mechthild Krause; Annett Linge; Fabian Lohaus; Michael Baumann; Inge Tinhofer; Volker Budach; Ali Sak; Martin Stuschke; Eleni Gkika; Anca-Ligia Grosu; Amir Abdollahi; Jürgen Debus; Stefan Stangl; Ute Ganswindt; Claus Belka; Steffi Pigorsch; Gabriele Multhoff; Stephanie E Combs; Stefan Welz; Daniel Zips; Su Yin Lim; Claus Rödel; Emmanouil Fokas
Journal:  Int J Cancer       Date:  2017-05-19       Impact factor: 7.396

4.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

Review 5.  Machine Learning in Medicine.

Authors:  Rahul C Deo
Journal:  Circulation       Date:  2015-11-17       Impact factor: 29.690

6.  DNA methylation-based classification of central nervous system tumours.

Authors:  David Capper; David T W Jones; Martin Sill; Volker Hovestadt; Daniel Schrimpf; Dominik Sturm; Christian Koelsche; Felix Sahm; Lukas Chavez; David E Reuss; Annekathrin Kratz; Annika K Wefers; Kristin Huang; Kristian W Pajtler; Leonille Schweizer; Damian Stichel; Adriana Olar; Nils W Engel; Kerstin Lindenberg; Patrick N Harter; Anne K Braczynski; Karl H Plate; Hildegard Dohmen; Boyan K Garvalov; Roland Coras; Annett Hölsken; Ekkehard Hewer; Melanie Bewerunge-Hudler; Matthias Schick; Roger Fischer; Rudi Beschorner; Jens Schittenhelm; Ori Staszewski; Khalida Wani; Pascale Varlet; Melanie Pages; Petra Temming; Dietmar Lohmann; Florian Selt; Hendrik Witt; Till Milde; Olaf Witt; Eleonora Aronica; Felice Giangaspero; Elisabeth Rushing; Wolfram Scheurlen; Christoph Geisenberger; Fausto J Rodriguez; Albert Becker; Matthias Preusser; Christine Haberler; Rolf Bjerkvig; Jane Cryan; Michael Farrell; Martina Deckert; Jürgen Hench; Stephan Frank; Jonathan Serrano; Kasthuri Kannan; Aristotelis Tsirigos; Wolfgang Brück; Silvia Hofer; Stefanie Brehmer; Marcel Seiz-Rosenhagen; Daniel Hänggi; Volkmar Hans; Stephanie Rozsnoki; Jordan R Hansford; Patricia Kohlhof; Bjarne W Kristensen; Matt Lechner; Beatriz Lopes; Christian Mawrin; Ralf Ketter; Andreas Kulozik; Ziad Khatib; Frank Heppner; Arend Koch; Anne Jouvet; Catherine Keohane; Helmut Mühleisen; Wolf Mueller; Ute Pohl; Marco Prinz; Axel Benner; Marc Zapatka; Nicholas G Gottardo; Pablo Hernáiz Driever; Christof M Kramm; Hermann L Müller; Stefan Rutkowski; Katja von Hoff; Michael C Frühwald; Astrid Gnekow; Gudrun Fleischhack; Stephan Tippelt; Gabriele Calaminus; Camelia-Maria Monoranu; Arie Perry; Chris Jones; Thomas S Jacques; Bernhard Radlwimmer; Marco Gessi; Torsten Pietsch; Johannes Schramm; Gabriele Schackert; Manfred Westphal; Guido Reifenberger; Pieter Wesseling; Michael Weller; Vincent Peter Collins; Ingmar Blümcke; Martin Bendszus; Jürgen Debus; Annie Huang; Nada Jabado; Paul A Northcott; Werner Paulus; Amar Gajjar; Giles W Robinson; Michael D Taylor; Zane Jaunmuktane; Marina Ryzhova; Michael Platten; Andreas Unterberg; Wolfgang Wick; Matthias A Karajannis; Michel Mittelbronn; Till Acker; Christian Hartmann; Kenneth Aldape; Ulrich Schüller; Rolf Buslei; Peter Lichter; Marcel Kool; Christel Herold-Mende; David W Ellison; Martin Hasselblatt; Matija Snuderl; Sebastian Brandner; Andrey Korshunov; Andreas von Deimling; Stefan M Pfister
Journal:  Nature       Date:  2018-03-14       Impact factor: 49.962

7.  Integrative analysis of DNA methylation suggests down-regulation of oncogenic pathways and reduced somatic mutation rates in survival outliers of glioblastoma.

Authors:  Taeyoung Hwang; Dimitrios Mathios; Kerrie L McDonald; Irene Daris; Sung-Hye Park; Peter C Burger; Sojin Kim; Yun-Sik Dho; Hruban Carolyn; Chetan Bettegowda; Joo Heon Shin; Michael Lim; Chul-Kee Park
Journal:  Acta Neuropathol Commun       Date:  2019-06-03       Impact factor: 7.578

Review 8.  Deep learning and alternative learning strategies for retrospective real-world clinical data.

Authors:  David Chen; Sijia Liu; Paul Kingsbury; Sunghwan Sohn; Curtis B Storlie; Elizabeth B Habermann; James M Naessens; David W Larson; Hongfang Liu
Journal:  NPJ Digit Med       Date:  2019-05-30

9.  The ribosomal protein S6 in renal cell carcinoma: functional relevance and potential as biomarker.

Authors:  Maximilian Knoll; Stephan Macher-Goeppinger; Jürgen Kopitz; Stefan Duensing; Sascha Pahernik; Markus Hohenfellner; Peter Schirmacher; Wilfried Roth
Journal:  Oncotarget       Date:  2016-01-05

Review 10.  Causability and explainability of artificial intelligence in medicine.

Authors:  Andreas Holzinger; Georg Langs; Helmut Denk; Kurt Zatloukal; Heimo Müller
Journal:  Wiley Interdiscip Rev Data Min Knowl Discov       Date:  2019-04-02
View more
  1 in total

1.  Whole Blood Transcriptional Fingerprints of High-Grade Glioma and Longitudinal Tumor Evolution under Carbon Ion Radiotherapy.

Authors:  Maximilian Knoll; Maria Waltenberger; Jennifer Furkel; Ute Wirkner; Aoife Ward Gahlawat; Ivana Dokic; Christian Schwager; Sebastian Adeberg; Stefan Rieken; Tobias Kessler; Felix Sahm; Laila König; Christel Herold-Mende; Stephanie E Combs; Jürgen Debus; Amir Abdollahi
Journal:  Cancers (Basel)       Date:  2022-01-28       Impact factor: 6.639

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.