| Literature DB >> 25600863 |
Arnaud Jf Installé1, Thierry Van den Bosch, Bart De Moor, Dirk Timmerman.
Abstract
BACKGROUND: Using machine-learning techniques, clinical diagnostic model research extracts diagnostic models from patient data. Traditionally, patient data are often collected using electronic Case Report Form (eCRF) systems, while mathematical software is used for analyzing these data using machine-learning techniques. Due to the lack of integration between eCRF systems and mathematical software, extracting diagnostic models is a complex, error-prone process. Moreover, due to the complexity of this process, it is usually only performed once, after a predetermined number of data points have been collected, without insight into the predictive performance of the resulting models.Entities:
Keywords: clinical decision support systems; data analysis; data collection; machine-learning
Year: 2014 PMID: 25600863 PMCID: PMC4288112 DOI: 10.2196/medinform.3251
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Typical workflow of clinical diagnostic model research. The Clinical Data Miner software framework improves support for the steps indicated in green. Support for steps marked in blue is planned for future work. (Abbreviations used: CRF=case report form; eCRF=electronic CRF; API=application programming interface.).
Figure 2In Clinical Data Miner (CDM)'s layered architecture, module cdm-common contains functionality common to client and server. The server code is implemented in module cdm-server, while client code is further split into user interface logic (cdm-client) and user interface presentation (cdm-client-gwt). Finally, cdm-webapp combines the modules and provides CDM's entry point.
Figure 3The DataManager application programming interfaces includes methods to access and preprocess data.
Figure 4Unified Modeling Language diagram of Clinical Data Miner (CDM)'s machine-learning application programming interfaces. ClassifierFacade is the entry point to CDM's machine-learning functionality, which operates on Classifier objects to obtain Model objects.
Figure 5Clinical Data Miner (CDM)'s data collection user interface. The possibility to include pictograms in case report forms is particularly interesting for variables obtained from imaging modalities.
Number of patient entries collected by CDM for the IETA studies, between May 2011 and September 2014.
| IETA | Complete entries | Total entries |
| #1 | 1600 | 2069 |
| #3 | 641 | 787 |
| #4 | 891 | 1179 |
| Total | 3132 | 4035 |
List of interrater agreement studies organized using CDM user interface modified for supporting such studies.
|
| Study | Phases | Reference |
| 1 | Improvement of interrater agreement through pictograms | With and without pictograms | [ |
| 2 | Endo-myometrial junction | 1 and 2 | [ |
| 3 | Polycystic ovaries | 1, 2a, 2b | [ |
| 4 | Uterine anomalies | - | [ |
| 5 | IETA 2 | - | Not yet publisheda |
| 6 | Contrast enhancement study | with and without enhanced contrast images | Not yet publishedb |
aData were collected between July 2012 and February 2013. Authors: L Valentin, A Installé, P Sladkevicius, D Timmerman, B Benacerraf, L Jokubkiene, A diLegge, A Votino, L Zannoni, and T Van den Bosch.
bData were collected between May 2013 and February 2014. Authors: A Sayasneh, A Installé, D Timmerman, T Van den Bosch, T Bourne, S Guerriero, F Rizzello, LPG Francesco, MA Pascual, A Rossi, A Czekierdowski, A Testa, E Coccia, and A Smith.
Figure 6Learning curves, plotting predictive performance with respect to number of patient inclusions, can easily be generated using Clinical Data Miner (CDM)'s libraries. (Abbreviations: AUC=area under the ROC curve; ROC=receiver operating characteristic.).
Breakdown per module of number of source lines of code (SLOC) and line and branch test coverage ratios, as determined by the sloccount and Cobertura programs, respectively.
|
| Production code | Test code | Line coverage | Branch coverage |
|
| (SLOCa) | (SLOC) | n (%) | n (%) |
| cdm-common | 5862 | 7023 | 1800/1957 (91.98) | 459/486 (94.4) |
| cdm-server | 15,260 | 28,109 | 5781/6250 (92.50) | 1437/1577 (91.12) |
| cdm-client | 3595 | 7607 | 1128/1269 (88.89) | 133/146 (91.1) |
| cdm-client-gwt | 4090 | 5123 | 957/1828 (52.35) | 137/321 (42.7) |
| cdm-webapp | 321 | 177 | 38/111 (34.2) | 2/2 (100) |
| Total | 29,128 | 48,039 | - | - |
| Weighted average | - | - | 9704/11,415 (85.01) | 2168/2532 (85.62) |
aNote that interfaces contribute to SLOC, but not to the number of lines analyzed for line coverage, leading to different counts for number of lines in the “Production code” and “Line coverage” columns.
Average agreement levels with survey propositions among respondents.
| Proposition | Average agreementa |
| CDM is user-friendly. | 8.6 |
| The layout of studies is clear. | 8.6 |
| The VASb is user-friendly. | 8.1 |
| CDM's VASb is a good alternative to a paper VASb. | 8.2 |
| Pictograms help to clarify questions. | 9.4 |
| Pictograms help to differentiate multiple choice questions. | 9.2 |
| Pictograms next to multiple choice options will improve reliability. | 9.3 |
a0 = no agreement; 10 = full agreement
bVAS = visual analog scale
Figure 7Distribution of respondents over different ranges of issue frequencies. A large majority, 79% (22/28), of survey participants experienced problems in less than 5% of their interactions with Clinical Data Miner.