Literature DB >> 25091808

Stability metrics for multi-source biomedical data based on simplicial projections from probability distribution distances.

Carlos Sáez1,2, Montserrat Robles1, Juan M García-Gómez1,3,4.   

Abstract

Biomedical data may be composed of individuals generated from distinct, meaningful sources. Due to possible contextual biases in the processes that generate data, there may exist an undesirable and unexpected variability among the probability distribution functions (PDFs) of the source subsamples, which, when uncontrolled, may lead to inaccurate or unreproducible research results. Classical statistical methods may have difficulties to undercover such variabilities when dealing with multi-modal, multi-type, multi-variate data. This work proposes two metrics for the analysis of stability among multiple data sources, robust to the aforementioned conditions, and defined in the context of data quality assessment. Specifically, a global probabilistic deviation and a source probabilistic outlyingness metrics are proposed. The first provides a bounded degree of the global multi-source variability, designed as an estimator equivalent to the notion of normalized standard deviation of PDFs. The second provides a bounded degree of the dissimilarity of each source to a latent central distribution. The metrics are based on the projection of a simplex geometrical structure constructed from the Jensen-Shannon distances among the sources PDFs. The metrics have been evaluated and demonstrated their correct behaviour on a simulated benchmark and with real multi-source biomedical data using the UCI Heart Disease data set. The biomedical data quality assessment based on the proposed stability metrics may improve the efficiency and effectiveness of biomedical data exploitation and research.

Keywords:  data quality; data reuse; data variability; information geometry; probability distribution distances

Mesh:

Year:  2016        PMID: 25091808     DOI: 10.1177/0962280214545122

Source DB:  PubMed          Journal:  Stat Methods Med Res        ISSN: 0962-2802            Impact factor:   3.021


  9 in total

Review 1.  The future of sleep health: a data-driven revolution in sleep science and medicine.

Authors:  Ignacio Perez-Pozuelo; Bing Zhai; Joao Palotti; Raghvendra Mall; Michaël Aupetit; Juan M Garcia-Gomez; Shahrad Taheri; Yu Guan; Luis Fernandez-Luque
Journal:  NPJ Digit Med       Date:  2020-03-23

2.  Bias Analysis on Public X-Ray Image Datasets of Pneumonia and COVID-19 Patients.

Authors:  Omar Del Tejo Catala; Ismael Salvador Igual; Francisco Javier Perez-Benito; David Millan Escriva; Vicent Ortiz Castello; Rafael Llobet; Juan-Carlos Perez-Cortes
Journal:  IEEE Access       Date:  2021-03-10       Impact factor: 3.476

3.  Subphenotyping of Mexican Patients With COVID-19 at Preadmission To Anticipate Severity Stratification: Age-Sex Unbiased Meta-Clustering Technique.

Authors:  Lexin Zhou; Nekane Romero-García; Juan Martínez-Miranda; J Alberto Conejero; Juan M García-Gómez; Carlos Sáez
Journal:  JMIR Public Health Surveill       Date:  2022-03-30

4.  Automated glioblastoma segmentation based on a multiparametric structured unsupervised classification.

Authors:  Javier Juan-Albarracín; Elies Fuster-Garcia; José V Manjón; Montserrat Robles; F Aparici; L Martí-Bonmatí; Juan M García-Gómez
Journal:  PLoS One       Date:  2015-05-15       Impact factor: 3.240

5.  Temporal variability analysis reveals biases in electronic health records due to hospital process reengineering interventions over seven years.

Authors:  Francisco Javier Pérez-Benito; Carlos Sáez; J Alberto Conejero; Salvador Tortajada; Bernardo Valdivieso; Juan M García-Gómez
Journal:  PLoS One       Date:  2019-08-07       Impact factor: 3.240

6.  Do population-level risk prediction models that use routinely collected health data reliably predict individual risks?

Authors:  Yan Li; Matthew Sperrin; Miguel Belmonte; Alexander Pate; Darren M Ashcroft; Tjeerd Pieter van Staa
Journal:  Sci Rep       Date:  2019-08-02       Impact factor: 4.379

Review 7.  The future of sleep health: a data-driven revolution in sleep science and medicine.

Authors:  Ignacio Perez-Pozuelo; Bing Zhai; Joao Palotti; Raghvendra Mall; Michaël Aupetit; Juan M Garcia-Gomez; Shahrad Taheri; Yu Guan; Luis Fernandez-Luque
Journal:  NPJ Digit Med       Date:  2020-03-23

8.  Data-driven discovery of changes in clinical code usage over time: a case-study on changes in cardiovascular disease recording in two English electronic health records databases (2001-2015).

Authors:  Patrick Rockenschaub; Vincent Nguyen; Robert W Aldridge; Dionisio Acosta; Juan Miguel García-Gómez; Carlos Sáez
Journal:  BMJ Open       Date:  2020-02-13       Impact factor: 2.692

9.  New Horizons in the use of routine data for ageing research.

Authors:  Oliver M Todd; Jennifer K Burton; Richard M Dodds; Joe Hollinghurst; Ronan A Lyons; Terence J Quinn; Anna Schneider; Katherine E Walesby; Chris Wilkinson; Simon Conroy; Chris P Gale; Marlous Hall; Kate Walters; Andrew P Clegg
Journal:  Age Ageing       Date:  2020-08-24       Impact factor: 10.668

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.