| Literature DB >> 28516233 |
Marc D Kohli1, Ronald M Summers2, J Raymond Geis3.
Abstract
At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities.Entities:
Keywords: Imaging informatics; Machine learning; Medical data; Medical image datasets; Medical imaging; Radiology
Mesh:
Year: 2017 PMID: 28516233 PMCID: PMC5537092 DOI: 10.1007/s10278-017-9976-3
Source DB: PubMed Journal: J Digit Imaging ISSN: 0897-1889 Impact factor: 4.056
Examples of commonly available metadata
| Element | Source | Example | Storage location |
|---|---|---|---|
| PatientsName | EHR/ADT | MARY^JONES^B | DICOM header |
| PatientID | EHR/ADT | 1232391-3 | DICOM header |
| StudyDescription | RIS | CT BRAIN W/O | DICOM header |
| Rows | Imaging modality | 512 | DICOM header |
| Columns | Imaging modality | 512 | DICOM header |
| BitsStored | Imaging modality | 12 | DICOM header |
| Key Images | Radiologist |
| DICOM Key Object |
| Measurement | Radiologist |
| Various (AIM, DICOM PS, DICOM SR) |
ADT admission, discharge, and transfer, EHR electronic health record, RIS radiology information system
Fig. 1Three different types of image annotations: anatomic region of interest segmentations (a), pathology region of interest segmentation such as this urinary calculus (b), and measurements (c)
Baseline metadata to catalog medical image data
| 1. Image types |
| a. Modality |
| b. Resolution |
| c. Number of images total and by series |
| 2. Number of imaging examinations |
| 3. Image examination source(s) |
| 4. Image acquisition parameters |
| 5. Image storage parameters (e.g., compression amount and type) |
| 6. Annotation |
| a. Type |
| b. What is annotated, and how |
| 7. Context |
| 8. How is ground truth defined and labeled |
| 9. Associated data |
| a. Demographic |
| b. Clinical |
| c. Lab |
| d. Genomic |
| e. Timeline |
| f. Social media |
| 10. Date range of image exam acquisition |
| 11. Log of dataset use |
| 12. Who owns the data |
| 13. Who is responsible for the data |
| 14. Allowable usage |
| 15. Access parameters |
| a. Accessibility |
| b. Costs and business agreements |
| 16. Case distribution |
| a. % Normals vs abnormals |
| b. Summary of abnormal examinations |
| i. Number of examinations with each pathology |
Many of these are semantic, with further subcategories not listed here
The FAIR guiding principles for scientific data management—Findability, Accessibility, Interoperability, and Reusability (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4792175/, Creative Commons License)
| The FAIR guiding principles |
|---|
| To be Findable: |
| F1. (Meta)data are assigned a globally unique and persistent identifier. |
| F2. Data are described with rich metadata (defined by R1 below). |
| F3. Metadata clearly and explicitly include the identifier of the data it describes. |
| F4. (Meta)data are registered or indexed in a searchable resource. |
| To be Accessible: |
| A1. (Meta)data are retrievable by their identifier using a standardized communications protocol. |
| A1.1 The protocol is open, free, and universally implementable. |
| A1.2 The protocol allows for an authentication and authorization procedure, where necessary. |
| A2. Metadata are accessible, even when the data are no longer available. |
| To be Interoperable: |
| I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation. |
| I2. (Meta)data use vocabularies that follow FAIR principles. |
| I3. (Meta)data include qualified references to other (meta)data. |
| To be Reusable: |
| R1. (Meta)data are richly described with a plurality of accurate and relevant attributes. |
| R1.1. (Meta)data are released with a clear and accessible data usage license. |
| R1.2. (Meta)data are associated with detailed provenance. |
| R1.3. (Meta)data meet domain-relevant community standards. |