| Literature DB >> 35402941 |
Vasileios C Pezoulas1, Konstantina D Kourou1,2, Fanis Kalatzis1, Themis P Exarchos3,4, Evi Zampeli5, Saviana Gandolfo6, Andreas Goules7, Chiara Baldini8, Fotini Skopouli9, Salvatore De Vita6, Athanasios G Tzioufas7, Dimitrios I Fotiadis1,10.
Abstract
Goal: To present a framework for data sharing, curation, harmonization and federated data analytics to solve open issues in healthcare, such as, the development of robust disease prediction models.Entities:
Keywords: Data sharing; data curation; data harmonization; federated data analytics; incremental learning
Year: 2020 PMID: 35402941 PMCID: PMC8940202 DOI: 10.1109/OJEMB.2020.2981258
Source DB: PubMed Journal: IEEE Open J Eng Med Biol ISSN: 2644-1276
Fig. 1.The distribution of the first principal component for each harmonized cohort dataset against the integrated dataset.
Fig. 2.The distribution of the second principal component for each harmonized cohort dataset against the integrated dataset.
Fig. 3.Receiver operating characteristic (ROC) curves for each incremental learning algorithm based on the same training-testing setting.
Fig. 4.The decision tree that is induced by the XGBoost schema.
Fig. 5.The overall analysis workflow which consists of the following modules: (i) data sharing assessment module, (ii) data curation module, (iii) data harmonization module, and (iv) distributed data analytics module. The workflow receives as input cohort data which are stored in secure private databases within a cloud environment. The outcomes include highly-qualified data, harmonized data, disease ontologies, disease prediction modes, etc.
Fig. 6.The workflow of the data harmonization module which consists of the following steps: (i) reference model construction, (ii) ontology construction, (iii) terminology extraction, (iv) medical corpus definition, (v) lexical matching, and (vi) semantic matching.
Fig. 7.An illustration of the incremental learning strategy within a cloud environment. The data are uploaded into private cloud spaces. The incremental learning model is incrementally updated across the training cohort data through the CCE and is finally evaluated on the testing cohort. The results are distributed to all the cohort that participated in the training-testing setting. CCE: Central Computing Engine, PS: Private Space.