Vasileios C Pezoulas1, Konstantina D Kourou2, Fanis Kalatzis1, Themis P Exarchos3, Aliki I Venetsanopoulou4, Evi Zampeli5, Saviana Gandolfo6, Fotini N Skopouli7, Salvatore De Vita6, Athanasios G Tzioufas4, Dimitrios I Fotiadis8. 1. Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, Greece. 2. Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, and Department of Biological Applications and Technology, University of Ioannina, Greece. 3. Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, and Department of Informatics, Ionian University, Corfu, Greece. 4. Department of Pathophysiology, School of Medicine, University of Athens, Greece. 5. Institute for Systemic Autoimmune and Neurological Diseases, Athens, Greece. 6. Clinic of Rheumatology, Department of Medical and Biological Sciences, University of Udine, Italy. 7. Department of Internal Medicine and Clinical Immunology, Euroclinic Hospital, Athens, Greece. 8. Unit of Medical Technology and Intelligent Information Systems, Department of Materials Science and Engineering, University of Ioannina, and Department of Biomedical Research, FORTH-IMBB, Ioannina, Greece. fotiadis@cc.uoi.gr.
Abstract
OBJECTIVES: To address the need for automatically assessing the quality of clinical data in terms of accuracy, relevance, conformity, and completeness, through the concise development and application of an automated method which is able to automatically detect problematic fields and match clinical terms under a specific domain. METHODS: The proposed methodology involves the automated construction of three diagnostic reports that summarise valuable information regarding the types and ranges of each term in the dataset, along with the detected outliers, inconsistencies, and missing values, followed by a set of clinically relevant terms based on a reference model which serves as a set of terms which describes the domain knowledge of a disease of interest. RESULTS: A case study was conducted using anonymised data from 250 patients who were diagnosed with primary Sjögren's syndrome (pSS), yielding reliable outcomes that were highlighted for clinical evaluation. Our method was able to successfully identify 28 features with detected outliers, and unknown data types, as well as, identify outliers, missing values, similar terms, and inconsistencies within the dataset. The data standardisation method was able to match 76 out of 85 (89.41%) pSS-related terms according to a standard pSS reference model which has been introduced by the clinicians. CONCLUSIONS: Our results confirm the clinical value of the data curation method towards the improvement of the dataset quality through the precise identification of outliers, missing values, inconsistencies, and similar terms, as well as, through the automated detection of pSS-related relevant terms towards data standardisation.
OBJECTIVES: To address the need for automatically assessing the quality of clinical data in terms of accuracy, relevance, conformity, and completeness, through the concise development and application of an automated method which is able to automatically detect problematic fields and match clinical terms under a specific domain. METHODS: The proposed methodology involves the automated construction of three diagnostic reports that summarise valuable information regarding the types and ranges of each term in the dataset, along with the detected outliers, inconsistencies, and missing values, followed by a set of clinically relevant terms based on a reference model which serves as a set of terms which describes the domain knowledge of a disease of interest. RESULTS: A case study was conducted using anonymised data from 250 patients who were diagnosed with primary Sjögren's syndrome (pSS), yielding reliable outcomes that were highlighted for clinical evaluation. Our method was able to successfully identify 28 features with detected outliers, and unknown data types, as well as, identify outliers, missing values, similar terms, and inconsistencies within the dataset. The data standardisation method was able to match 76 out of 85 (89.41%) pSS-related terms according to a standard pSS reference model which has been introduced by the clinicians. CONCLUSIONS: Our results confirm the clinical value of the data curation method towards the improvement of the dataset quality through the precise identification of outliers, missing values, inconsistencies, and similar terms, as well as, through the automated detection of pSS-related relevant terms towards data standardisation.
Authors: Vasileios C Pezoulas; Konstantina D Kourou; Fanis Kalatzis; Themis P Exarchos; Evi Zampeli; Saviana Gandolfo; Andreas Goules; Chiara Baldini; Fotini Skopouli; Salvatore De Vita; Athanasios G Tzioufas; Dimitrios I Fotiadis Journal: IEEE Open J Eng Med Biol Date: 2020-03-16