| Literature DB >> 35036489 |
Elhassan Mohamed1, Konstantinos Sirlantzis1, Gareth Howells1.
Abstract
The purpose of the dataset is to provide annotated images for pixel classification tasks with application to powered wheelchair users. As some of the widely available datasets contain only general objects, we introduced this dataset to cover the missing pieces, which can be considered as application-specific objects. However, these objects of interest are not only important for powered wheelchair users but also for indoor navigation and environmental understanding in general. For example, indoor assistive and service robots need to comprehend their surroundings to ease navigation and interaction with different size objects. The proposed dataset is recorded using a camera installed on a powered wheelchair. The camera is installed beneath the joystick so that it can have a clear vision with no obstructions from the user's body or legs. The powered wheelchair is then driven through the corridors of the indoor environment, and a one-minute video is recorded. The collected video is annotated on the pixel level for semantic segmentation (pixel classification) tasks. Pixels of different objects are annotated using MATLAB software. The dataset has various object sizes (small, medium, and large), which can explain the variation of the pixel's distribution in the dataset. Usually, Deep Convolutional Neural Networks (DCNNs) that perform well on large-size objects fail to produce accurate results on small-size objects. Whereas training a DCNN on a multi-size objects dataset can build more robust systems. Although the recorded objects are vital for many applications, we have included more images of different kinds of door handles with different angles, orientations, and illuminations as they are rare in the publicly available datasets. The proposed dataset has 1549 images and covers nine different classes. We used the dataset to train and test a semantic segmentation system that can aid and guide visually impaired users by providing visual cues. The dataset is made publicly available at this link. CrownEntities:
Keywords: Convolutional neural network; Deep learning; Door handles; Image dataset; Indoor objects; Pixels classification; Semantic segmentation
Year: 2022 PMID: 35036489 PMCID: PMC8749214 DOI: 10.1016/j.dib.2022.107791
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Camera installation for data collection.
Fig. 2Indoor classes of interest of the proposed dataset.
Fig. 3Examples from the collected dataset with the first row representing the images and the second row representing the corresponding pixels annotations.
Fig. 4Pixels distribution in the proposed dataset.
The number of annotated pixels per class and the number of object instances.
| Class | Pixel count (Million) | Number of instances |
|---|---|---|
| Door | 239.87 | 1742 |
| Pull door handle | 0.95 | 173 |
| Push button | 0.63 | 159 |
| Moveable door handle | 2.87 | 1134 |
| Push door handle | 0.78 | 262 |
| Fire extinguisher | 4.25 | 486 |
| Key slot | 0.78 | 216 |
| Carpet floor | 20.32 | 698 |
| Background wall | 96.40 | 398 |
| Subject | Computer science: Artificial Intelligence |
| Specific subject area | The provided dataset is annotated on the pixel level. Consequently, it is suitable for semantic segmentation (pixel classification) tasks. |
| Type of data | Images |
| How data were acquired | A one-minute video is collected while driving the powered wheelchair through the indoor environment corridors. The used camera to record the video is installed beneath the joystick of the powered wheelchair (the height of the camera from the ground is 68 cm). Data is then processed using MATLAB software (Video Labeller) to annotate video frames on the pixel level. The output of this process is two folders with the images and the corresponding annotations. |
| Data format | Original images (.png) |
| Parameters for data collection | We have recorded a high-resolution video using the Intel® RealSense™ camera so that small objects, such as door handles, can comprise many pixels. This can enhance the features extraction process; consequently, accurate systems can be attained. The camera position, beneath the joystick, is chosen for the following reasons: the camera is integrated into the powered wheelchair body. There is no obstruction between the camera and the user's body. Lastly, the camera has a comparative perspective to the user field of view. |
| Description of data collection | The collected dataset has two folders for the images and the annotated pixels. Also, we included the two MATLAB files for image and pixel datastores which can be loaded in MATLAB software. Please note that the source file paths in images and pixels datastores should be modified to point to the new location of images and pixel labels folders. |
| Data source location | Institution: School of Engineering and Digital Arts, University of Kent |
| Data accessibility | Repository name: Mendeley Data |
| Related research article | E. Mohamed, K. Sirlantzis and G. Howells, "Indoor/Outdoor Semantic Segmentation Using Deep Learning for Visually Impaired Wheelchair Users," in IEEE Access, vol. 9, pp. 147914-147932, 2021, doi: |