| Literature DB >> 31991724 |
Sinan Chen1, Sachio Saiki1, Masahide Nakamura1,2.
Abstract
To implement fine-grained context recognition that is accurate and affordable for general households, we present a novel technique that integrates multiple image-based cognitive APIs and light-weight machine learning. Our key idea is to regard every image as a document by exploiting "tags" derived by multiple APIs. The aim of this paper is to compare API-based models' performance and improve the recognition accuracy by preserving the affordability for general households. We present a novel method for further improving the recognition accuracy based on multiple cognitive APIs and four modules, fork integration, majority voting, score voting, and range voting.Entities:
Keywords: cognitive APIs; context recognition; image; machine learning; majority voting; range voting; score voting; smart home
Year: 2020 PMID: 31991724 PMCID: PMC7038333 DOI: 10.3390/s20030666
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1Example of an image with multiple people misrecognized using a single cognitive API.
Figure 2Example of recognizing an image using multiple cognitive APIs and majority voting.
Figure 3Example of majority voting with ensemble learning.
Figure 4Example of score voting using the total of each class probability.
Figure 5Example of range voting using all class probabilities that scored above 70% based on Figure 4.
Figure 6Example of constructing a model by combining the features of multiple cognitive APIs.
The detail of each defined context in this experiment.
| Context Labels | The Contents of What the Images of Each Context Represent |
|---|---|
| Dining together | We often cook by ourselves to |
| General meeting | We are sitting together in a |
| Nobody | There is also the |
| One-to-one meeting | We often have a |
| Personal study | Sometimes the public computer is used for |
| Play games | We often gather around and |
| Room cleaning | The staff twice a week come for |
The representative images of the four contexts (including tag results from different APIs).
The representative images of the other three contexts (including tag results) and USB camera.
The accuracy results of each cognitive API-based model and three voting modules.
| Model or Voting Names | Overall Accuracy | Dining Together | General Meeting | Nobody | One-to-One Meeting | Personal Study | Play Games | Room Cleaning |
|---|---|---|---|---|---|---|---|---|
| Azure API – model | 0.8543 | 0.9550 | 0.8910 | 1.0000 | 0.6610 | 0.9170 | 0.8430 | 0.7650 |
| Watson API – model | 0.8000 | 0.8860 | 0.6730 | 0.8230 | 0.8040 | 0.9380 | 0.8040 | 0.7060 |
| Clarifai API – model | 0.9143 | 0.9090 | 0.9820 | 0.9110 | 0.8390 | 0.9170 | 0.9220 | 0.9220 |
| Imagga API – model |
| 0.9550 | 0.9270 | 1.0000 | 0.8930 | 0.9580 | 0.9220 | 0.9610 |
| ParalleDots API – model |
| 0.7950 | 0.8910 | 0.9330 |
| 0.8750 | 0.6670 | 0.8040 |
| Majority voting | 0.9753 | 0.9565 | 1.0000 | 1.0000 | 1.0000 | 0.9561 | 1.0000 | 0.9572 |
| Score voting | 0.9776 | 1.0000 | 0.9685 | 1.0000 | 1.0000 | 0.9751 | 1.0000 | 0.9720 |
| Range voting (0.5 to 0.6) |
| 1.0000 | 0.9836 | 1.0000 | 1.0000 | 0.9800 | 1.0000 | 1.0000 |
Figure 7The distribution of accuracy results using range voting within the range 0 to 0.9.