| Literature DB >> 32939378 |
Andre G C Pacheco1,2, Gustavo R Lima3,4, Amanda S Salomão3,4, Breno Krohling2, Igor P Biral3,4, Gabriel G de Angelo2, Fábio C R Alves3,4, José G M Esgario1,2, Alana C Simora3,4, Pedro B C Castro2, Felipe B Rodrigues2, Patricia H L Frasson5,3, Renato A Krohling1,2,6, Helder Knidel2, Maria C S Santos7, Rachel B do Espírito Santo8,3, Telma L S G Macedo8,3, Tania R P Canuto8,3, Luíz F S de Barros3.
Abstract
Over the past few years, different Computer-Aided Diagnosis (CAD) systems have been proposed to tackle skin lesion analysis. Most of these systems work only for dermoscopy images since there is a strong lack of public clinical images archive available to evaluate the aforementioned CAD systems. To fill this gap, we release a skin lesion benchmark composed of clinical images collected from smartphone devices and a set of patient clinical data containing up to 21 features. The dataset consists of 1373 patients, 1641 skin lesions, and 2298 images for six different diagnostics: three skin diseases and three skin cancers. In total, 58.4% of the skin lesions are biopsy-proven, including 100% of the skin cancers. By releasing this benchmark, we aim to support future research and the development of new tools to assist clinicians to detect skin cancer.Entities:
Keywords: Cancer research; Clinical data; Computer-Aided Diagnosis (CAD); Skin cancer; Skin lesion
Year: 2020 PMID: 32939378 PMCID: PMC7479321 DOI: 10.1016/j.dib.2020.106221
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Description of each attribute present in the metadata CSV file.
| Attribute | Description |
|---|---|
| patient_id | a string representing the patient ID – example: PAT_1234 |
| lesion_id | a string representing the lesion ID – example: 123 |
| img_id | a string representing the image ID, which is a composition of the patient ID, lesion ID, and a random number – example: PAT_1234_123_000 |
| smoke | a boolean to map if the patient smokes cigarettes |
| drink | a boolean to map if the patient consumes alcoholic beverages |
| background_father and background_mother | a string representing the country in which the patient’s father and mother descends. Note: many patients descend from Pomerania, a region between Poland and Germany. Although it is not a country, we decided to keep the nomenclature, since they identify themselves as Pomeranians descendants. |
| age | an integer representing the patient’s age |
| pesticide | a boolean to map if the patient uses pesticides |
| gender | a string representing the patient’s gender |
| skin_cancer_history | a boolean to map if the patient or someone in their family has had skin cancer in the past |
| cancer_history | a boolean to map if the patient or someone in their family has had any type of cancer in the past |
| has_piped_water | a boolean to map if the patient has access to piped water in their home |
| has_sewage_system | a boolean to map if the patient has access to a sewage system in their home |
| fitspatrick | a integer representing the Fitspatrick skin type |
| region | a string representing one of the 15 macro-regions previously described |
| diameter_1 and diameter_2 | a float representing the skin lesions’ horizontal and vertical diameters |
| diagnostic | a string representing the skin lesion diagnostic – BCC, SCC, ACK, SEK, MEL, or NEV |
| itch | a boolean to map if the skin lesion itches |
| grew | a boolean to map if the skin lesion has recently grown |
| hurt | a boolean to map if the skin lesion hurts |
| changed | a boolean to map if the skin lesion has recently changed |
| bleed | a boolean to map if the skin lesion has bled |
| elevation | a boolean to map if the skin lesion has an elevation |
| biopsed | a boolean to map if the diagnostic comes from clinical consensus or biopsy |
Fig. 1An illustration of the software structure that we developed to collect data at the PAD.
Fig. 2Data collection workflow of the PAD-UFES-20 dataset.
Fig. 3A sample of each type of skin lesion present in PAD-UFES-20 dataset.
The number of samples and the % of biopsy-proven for each type of skin lesion present in PAD-UFES-20 dataset.
| Diagnostic | % biopsied | |
|---|---|---|
| Actinic Keratosis (ACK) | 730 | 24.4% |
| Basal Cell Carcinoma of skin (BCC) | 845 | 100% |
| Malignant Melanoma (MEL) | 52 | 100% |
| Melanocytic Nevus of Skin (NEV) | 244 | 24.6% |
| Squamous Cell Carcinoma (SCC) | 192 | 100% |
| Seborrheic Keratosis (SEK) | 235 | 6.4% |
Fig. 4The patients age distribution according to gender and the age boxplots for each diagnostic.
| Subject | Cancer Research and Computer Vision and Pattern Recognition |
| Specific subject area | Automated Skin Cancer detection |
| Type of data | Images and Metadata |
| How data were acquired | All data were collected through smartphone devices using an application developed specifically to this work |
| Data format | Portable Network Graphics (PNG) and Comma Separated Values (CSV) file formats |
| Parameters for data collection | All data are collected during the patient appointment |
| Description of data collection | Each sample in this dataset consists of an clinical image and a set of patient clinical data that contains up to 21 features |
| Data source location | Institution: Federal University of Espírito Santo (UFES) Region: Espírito Santo Country: Brazil |
| Data accessibility | Dataset is available on |
| Related research article | Andre G. C. Pacheco and Renato A. Krohling, ”The impact of patient clinical information on automated skin cancer detection.” Computers in biology and medicine 116 (2020): 103545. https://doi.org/10.1016/j.compbiomed.2019.103545 |