| Literature DB >> 34272388 |
Michael Rutherford1, Seong K Mun2, Betty Levine1, William Bennett1, Kirk Smith1, Phil Farmer1, Quasar Jarosz1, Ulrike Wagner3, John Freyman3, Geri Blake1, Lawrence Tarbox1, Keyvan Farahani4, Fred Prior5,6.
Abstract
We developed a DICOM dataset that can be used to evaluate the performance of de-identification algorithms. DICOM objects (a total of 1,693 CT, MRI, PET, and digital X-ray images) were selected from datasets published in the Cancer Imaging Archive (TCIA). Synthetic Protected Health Information (PHI) was generated and inserted into selected DICOM Attributes to mimic typical clinical imaging exams. The DICOM Standard and TCIA curation audit logs guided the insertion of synthetic PHI into standard and non-standard DICOM data elements. A TCIA curation team tested the utility of the evaluation dataset. With this publication, the evaluation dataset (containing synthetic PHI) and de-identified evaluation dataset (the result of TCIA curation) are released on TCIA in advance of a competition, sponsored by the National Cancer Institute (NCI), for algorithmic de-identification of medical image datasets. The competition will use a much larger evaluation dataset constructed in the same manner. This paper describes the creation of the evaluation datasets and guidelines for their use.Entities:
Year: 2021 PMID: 34272388 PMCID: PMC8285420 DOI: 10.1038/s41597-021-00967-y
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Schematic description of the processing steps involved in the creation of the evaluation dataset and de-identified evaluation dataset.
Evaluation Dataset Characterization.
| DATA SET DESCRIPTION | ||||||
|---|---|---|---|---|---|---|
| Modality | Patients | Studies | Series | Images | Anatomy (# Studies) | Manufacturer (# Studies) |
| CT | 5 | 5 | 5 | 268 | BLADDER (4) CHEST (1) | GE MEDICAL SYSTEMS (2) PHILIPS (1) SIEMENS (1) TOSHIBA (1) |
| MR | 3 | 3 | 5 | 150 | KIDNEY (2) PELVIS (1) | GE MEDICAL SYSTEMS (1) SIEMENS (2) |
| PT | 5 | 5 | 6 | 1,203 | [BLANK] (1) BREAST (2) EXTREMITY (2) | GE MEDICAL SYSTEMS (4) SIEMENS (1) |
| DX | 4 | 4 | 4 | 10 | CHEST (4) | GE MEDICAL SYSTEMS (1) PHILIPS (3) |
| CR | 3 | 3 | 4 | 4 | CHEST (2) UTERUS (1) | FUJIFILM (3) |
| MG | 2 | 2 | 2 | 58 | BREAST (2) | LORAD (1) VICTRE (1) |
| Total | 21 | 22 | 26 | 1,693 | 22 | 22 |
This table describes the size of the dataset with totals for patients, studies, series, images, body part examined and manufacturers. (Note: VICTRE is not an equipment manufacturer, but a collection of synthetic image data). Imaging modalities are indicated using the DICOM conventions (CT = Computed Tomography, MR = Magnetic Resonance imaging, PT = Positron Emission Tomography, DX = Digital X-ray, CR = Computed Radiography, MG = Mammography).
Unusual DICOM attributes containing PHI.
| DICOM Tag | DICOM Description | Freq |
|---|---|---|
| <(0008,0041)> | Data Set Subtype | 1 |
| <(0018,1250)> | Receive Coil Name | 2 |
| <(0018,7006)> | Detector Description | 3 |
| <(0010,0021)> | Issuer of Patient ID | 4 |
| <(0032,1030)> | Reason for Study | 5 |
| <(0008,1080)> | Admitting Diagnoses Description | 6 |
| <(0032,1000)> | Scheduled Study Start Date | 11 |
| <(0018,0010)> | Contrast/Bolus Agent | 15 |
| <(0018,1401)> | Acquisition Device Processing Code | 29 |
| <(0018,1000)> | Device Serial Number | 31 |
| <(0008,1010)> | Station Name | 33 |
| <(0032,1060)> | Requested Procedure Description | 37 |
| <(0008,2111)> | Derivation Description | 44 |
| <(3006,0006)> | Structure Set Description | 50 |
| <(3006,0008)> | Structure Set Date | 57 |
| <(0032,4000)> | Study Comments | 70 |
| <(0010,21b0)> | Additional Patient History | 76 |
| <(0032,1070)> | Requested Contrast Agent | 101 |
| <(0008,1030)> | Study Description | 297 |
| <(0010,4000)> | Patient Comments | 1192 |
The table displays examples of unusual DICOM attributes, and their frequency counts identified in the analysis of the TCIA audit logs.
Private DICOM Attributes containing PHI.
| DICOM Tag | DICOM Description | Freq |
|---|---|---|
| <(0027,“GEMS_IMAG_01”,33)> | ImagingOptions | 1 |
| <(3f01,“INTELERAD MEDICAL SYSTEMS”,03)> | SourceAE | 1 |
| <(7005,“TOSHIBA_MEC_CT3”,1c)> | Contrast/Bolus Agent for Series Record | 1 |
| <(0009,“GEMS_PETD_01”,37)> | Batch Description | 2 |
| <(0045,“GEMS_SENO_02”,26)> | MAOBuffer | 2 |
| <(0009,“FDMS 1.0”,92)> | KanjiDepartmentName | 3 |
| <(0009,“GEMS_IDEN_01”,30)> | ServiceId | 4 |
| <(0043,“GEMS_PARM_01”,80)> | Coil ID Data | 8 |
| <(0021,“SIEMENS MR SDS 01”,19)> | MR Phoenix Protocol | 15 |
| <(0023,“GEMS_STDY_01”,70)> | StartTimeSecsInFirstAxial | 156 |
The table displays examples of Private DICOM Attributes, and their frequency counts identified in the analysis of the TCIA audit logs.
Attribute Types.
| Type | Description |
|---|---|
| Type 1: | Required to be in the SOP Instance and shall have a valid value. |
| Type 2: | Required to be in the SOP Instance but may contain the value of “unknown”, or a zero length value. |
| Type 3: | Optional. May or may not be included and could be zero length. |
| Type 1C: | Conditional. If a condition is met, then it is a Type 1 (required, cannot be zero). If condition is not met, then the tag is not sent. |
| Type 2C: | Conditional. If condition is met, then it is a Type 2 (required, zero length OK). If condition is not met, then the tag is not sent. |
The table displays Attribute Types as defined in the DICOM standard.
General and modality specific data Attributes and Types as specified in the DICOM standard.
| Tag | Attribute | Type | Modality | Description |
|---|---|---|---|---|
| <(0008,0016)> | SOP Class UID | 1 | All | Uniquely identifies the SOP Class. |
| <(0008,0020)> | Study Date | 2 | All | Date the Study started. |
| <(0008,0060)> | Modality | 1 | All | Type of equipment that originally acquired the data used to create the images in this Series. |
| <(0010,0010)> | Patient’s Name | 2 | All | Patient’s full name. |
| <(0020,0060)> | Laterality | 2C | All | Laterality of <(paired)> body part examined. Required if the body part examined is a paired structure and Image Laterality <(0020,0062)> is not sent. |
| <(0028,0004)> | Photometric Interpretation | 1 | CR | Specifies the intended interpretation of the pixel data. |
| <(0008,0008)> | Image Type | 1 | CT | Image identification characteristics. |
| <(0018,0060)> | KVP | 2 | CT | Peak kilo voltage output of the x-ray generator used |
| <(0008,0068)> | Presentation Intent Type | 1 | DX | Identifies the intent of the images that are contained within this Series. |
| <(0008,0070)> | Manufacturer | 2 | DX | Manufacturer of the equipment that produced the Composite Instances. |
| <(0028,0120)> | Pixel Padding Value | 1C | DX | Required if Pixel Padding Range Limit (0028,0121) is present and either Pixel Data (7FE0,0010) or Pixel Data Provider URL (0028,7FE0) is present. May be present otherwise only if Pixel Data (7FE0,0010) or Pixel Data Provider URL (0028,7FE0) is present. |
| <(6000,3000)> | Overlay Data | 1 | DX | Overlay pixel data. |
| <(0018,1508)> | Positioner Type | 1 | MG | MAMMOGRAPHIC or NONE |
| <(0040,0318)> | Organ Exposed | 1 | MG | Organ to which Organ Dose (0040,0316) applies. BREAST |
| <(0028,0100)> | Bits Allocated | 1 | MR | Number of bits allocated for each pixel sample. Each sample shall have the same number of bits allocated. |
| <(0028,0101)> | Bits Stored | 1 | MR | Number of bits stored for each pixel sample. Each sample shall have the same number of bits stored. |
| <(0020,0032)> | Image Position <(Patient)> | 1 | PT | The x, y, and z coordinates of the upper left hand corner <(center of the first voxel transmitted)> of the image, in mm. |
| <(0020,0037)> | Image Orientation <(Patient)> | 1 | PT | The direction cosines of the first row and the first column with respect to the patient. |
| <(0008,0064)> | Conversion Type | 1 | SC | Describes the kind of image conversion |
“All” applies to all modalities. Per the DICOM standard, Type 1 is required, Type 1 C is required if certain specified conditions are met, Type 2 is required but the value may be unknown (0 length), Type 2 C is a Type 2 conditional. DICOM Type 3 data elements are optional.
Fig. 2Schematic description of the standard TCIA Curation Workflow based on the Posda tool suite.
Re-Identification Operations.
| Operation | Description |
|---|---|
| set_tag | Set specified tag to given value |
| delete_tag | Delete specified tag |
| shift_date | Shift date based on given value |
| substitute | Modifies tag with existing value |
| string_replace | Substitutes text within a tag |
| annotate_img | Burns given text at given coordinates |
The table identifies operations utilized in the Posda tools to re-identify DICOM datasets with synthetic data.
Answer key format.
| Scope | Tag | Tag Name | Action | Action Text |
|---|---|---|---|---|
| <Study> | <(0008,0050)> | <Accession Number> | <text_removed> | <[“20130912E245583”]> |
| <Study> | <(0008,0080)> | <Institution Name> | <text_removed> | <[“Treetop Medical Center”]> |
| <Study> | <(0008,0090)> | <Referring Physician’s Name> | <text_removed> | <[“ROBERTSON^JESSE”]> |
| <Study> | <(0008,1050)> | <Performing Physician’s Name> | <text_removed> | <[“PHILLIPS^JOHN”]> |
| <Study> | <(0008,0050)> | <Accession Number> | <text_removed> | <[“20130912E801911”]> |
| <Study> | <(0008,1030)> | <Study Description> | <text_removed> | <[“Billy Rogers”]> |
| <Study> | <(0008,1030)> | <Study Description> | <text_retained> | <[“XR CHEST AP PORTABLE”]> |
| <Study> | <(0008,0090)> | <Referring Physician’s Name> | <text_removed> | <[“BAILEY^THERESA”]> |
| <Study> | <(0008,1050)> | <Performing Physician’s Name> | <text_removed> | <[“SMITH^MARY”]> |
| <Patient> | <(0010,0020)> | <Patient ID> | <text_removed> | <[“6774825273”]> |
| <Patient> | <(0010,0010)> | <Patient’s Name> | <text_removed> | <[“ROGERS^BILLY”]> |
| <Patient> | <(0010,0030)> | <Patient’s Birth Date> | <text_removed> | <[“19430722”]> |
This table shows the format of the answer key used to compare the results of de-identification to the original evaluation dataset. The answer key is based on TCIA de-identification standards and TCIA best practice.
Answer Key actions.
| Action | Description |
|---|---|
| tag_retained | The tag itself is retained and present in the DICOM dataset |
| text_notnull | The value of the tag is not null or zero length value |
| text_retained | The text specified was retained in the tag value |
| text_removed | The test specified was removed from the tag value |
| date_shifted | The date was shifted using the specified shift value |
| uid_changed | The UID was updated according to curation crosswalk |
| pixels_hidden | The pixels within coordinates specified are hidden |
This table lists the actions used in the answer key to do the comparisons. Various actions were used such as tag retained to ensure a tag is not removed and date shifted to check whether a date was shifted using a particular shift value.
| Measurement(s) | Deidentification • Clinical Data |
| Technology Type(s) | data synthesis • digital curation |
| Factor Type(s) | imaging type |
| Sample Characteristic - Organism | Homo sapiens |