| Literature DB >> 35744466 |
Norah N Alajlan1, Dina M Ibrahim1,2.
Abstract
Recently, the Internet of Things (IoT) has gained a lot of attention, since IoT devices are placed in various fields. Many of these devices are based on machine learning (ML) models, which render them intelligent and able to make decisions. IoT devices typically have limited resources, which restricts the execution of complex ML models such as deep learning (DL) on them. In addition, connecting IoT devices to the cloud to transfer raw data and perform processing causes delayed system responses, exposes private data and increases communication costs. Therefore, to tackle these issues, there is a new technology called Tiny Machine Learning (TinyML), that has paved the way to meet the challenges of IoT devices. This technology allows processing of the data locally on the device without the need to send it to the cloud. In addition, TinyML permits the inference of ML models, concerning DL models on the device as a Microcontroller that has limited resources. The aim of this paper is to provide an overview of the revolution of TinyML and a review of tinyML studies, wherein the main contribution is to provide an analysis of the type of ML models used in tinyML studies; it also presents the details of datasets and the types and characteristics of the devices with an aim to clarify the state of the art and envision development requirements.Entities:
Keywords: Internet of Things; deep learning; edge devices; machine learning; tiny machine learning
Year: 2022 PMID: 35744466 PMCID: PMC9227753 DOI: 10.3390/mi13060851
Source DB: PubMed Journal: Micromachines (Basel) ISSN: 2072-666X Impact factor: 3.523
Figure 1A framework of IoT applications with Cloud computing, Edge computing and TinyML.
Figure 2Comparison between the main characteristics of devices powered either by microcontroller or microprocessor.
Search plan/approach.
| Source | Criteria |
|---|---|
| Database | web-based resources and Web of Science |
| Date of publication | 2019–2021 |
| Keywords | TinyML |
| Language | English |
| Type of publication | Conference Proceedings |
| Inclusion criteria | TinyML Use Cases paper |
| Exclusion criteria | Challenges and directions paper |
Figure 3PRISMA flowchart for the study.
Comparison between earlier studies related to DL methodologies based on the used models, the results, and the inference devices.
| Study | Model | Model Result in Desktop | Inference in Devices | Result after Deployment | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC | Model Size | Platform | Name | Platform | Metrics | Latency | Ram | Flash Memory | ||
| [ | SVM | 84% | - | - | All devices | STM X-Cube-AI expansion package, and C language platform | All 84% | <1 ms | - | - |
| ANN1 | 99%—<1 m | - | - | F746ZG | Both 99% | 1 ms | - | - | ||
| H743ZI2 | ||||||||||
| KNN | 99%—<1 ms | - | - | F746ZG | Both 92% | Both 10 ms | - | - | ||
| H743ZI2 | ||||||||||
| ANN2 | 99%—<1 ms | - | - | F746ZG | Both 99% | Both <1 ms | - | - | ||
| H743ZI2 | ||||||||||
| DT | 99%—<1 ms | - | - | F746ZG | Both 99% | Both <1 ms | - | - | ||
| H743ZI2 | ||||||||||
| ANN3 | 0.86 | - | - | F401RE | 0.86 R2 | <1 ms | - | - | ||
| F746ZG | ||||||||||
| H743ZI2 | ||||||||||
| L452RE | ||||||||||
| [ | NN | 97.25% | 15 MB | TFLite and TFliteConver | F746ZG | X-CUBE-AI tool | 100% | 330 ms | 135.68 | 668.97 |
| [ | CNN1 | 98.53% | 185 KB | TF Lite | OpenMV H7 board STM32H743VI. | TF-Convert | 95.28% 98.84% | 20 FPS | - | - |
| [ | CNN | 99.83% | 1.5 MB | - | OpenMV H7 STM32H743VI. | - | 99.83% | 30 FPS | - | - |
| SqueezeNet | 98.50% | 8.0 MB | 98.53% | |||||||
| SqueezeNet2 | 98.93% | 3.8 MB | 98.99% | |||||||
| [ | Keras | 19% | - | TensorFlow | Taiyo Yuden EYSHSNZWZ NRF52 | - | - | - | - | - |
| LSTM | 93% | 2.8 MB | TensorFlow | - | Tensor Flow Lite Micro- Not Support it | - | - | - | - | |
| [ | RNN | 61% | - | - | ATMega4809 | TensorFlow. | 84% | Both 40 Hz | Both 2 KB | Both 32 KB |
| 93% | ||||||||||
| [ | NN | - | - | - | ESP32 | Arduino-LMIC software | 99.33% indoor | 2 min per activity. | - | - |
| [ | TinySpeech-X. | 96.4% | - | TensorFlow Lite for Microcontroller | - | - | - | - | - | - |
| TinySpeech-Y | 93.6% | 48.8 KB | ||||||||
| TinySpeech-Z | 92.4% | 21.6 KB | ||||||||
| TinySpeech-M | 91.9% | - | ||||||||
| [ | LetNet5 model | 99.53% | - | PyTorch | STM32 L476 board | X-Cube-AI (float32 operations) | - | 14.15 ms | 80 MHz | - |
| NXP k64f | ARM CMSIS-NN. | - | 0.97 | 120 MHz | - | |||||
| GAP8 | PULP-NN | - | 1000 fps with 1 ms | - | - | |||||
Comparison between earlier studies related to Design tinyML frameworks and libraries based on the used models, the results and the inference devices.
| Study | Model | Model Result in Desktop | Inference in Devices | Result after Deployment | ||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ACC | Model Size | Platform | Name | Platform | Metrics | Latency | Ram | Flash Memory | ||
| [ | VWW model | - | - | - | Sparkfun Edge | TensorFlow Lite Micro | - | - | 4.857 KB | 81.79 KB |
| [ | TinyNAS model and TinyEngin library | - | - | - | STM32F746 | MCUNET | 61.8% | 49.5% at 5 FPS and 40.5% at 10 FPS | 0.49 MB | 1.9 MB |
| 87% | 89% at 5 FPS and 87% at 10 FPS | 91 KB | <140 MB | |||||||
| - | 94% at 5 FPS and 91% at 10 FPS | - | <124 MB | |||||||
| 36.4 KB | ||||||||||
| [ | MobileNet-V2 | - | - | - | STM32H747I-Disco | TensorFlow Lite and MbedOS | 88% | 220 ms | 138,240 KB | Matrix Size: 611,912 |
| OpenMV Cam H7 | ||||||||||
| [ | MobileNet-V1 | - | 1.97 MB | STM32H743 SoC | CMix-NN | 68.2% | 1.86 s | - | - | |
| [ | ProxyNAS with FT-Full | - | - | Computation theory | - | - | - | - | 391 MB | - |
| ProxyNAS with tinyTL | 65 MB | |||||||||
| Inception-V3 with FT-Full | 850 MB | |||||||||
| TinyTL with FA | 66 MB | |||||||||
Figure 4Dataset type distribution in previous studies.
Summary of datasets used in TinyML studies including input type.
| Input Type | Dataset | Reference |
|---|---|---|
| Images | Handwritten digits | [ |
| Sign MNIST dataset from Kaggle | [ | |
| Kaggle ASL dataset (26 classes) | [ | |
| ASL dataset created by authors | [ | |
| Kaggle ASL Alphabet test set. | [ | |
| Face Mask 12 K Images dataset from Kaggle | [ | |
| Face Mask Classification dataset from Kaggle | [ | |
| Face Mask Dataset created by authors | [ | |
| Face Mask testing dataset created by authors | [ | |
| ImageNET (Flower, CUP, Pets, Food, CIFAR10 and CIFAR100) | [ | |
| Visual Wake Word (VWW) | ||
| Physiological/ Behavioral Metrics | Heart dataset | [ |
| Hand gesture recorded ((Forward, Backward, Select and Abort) using fingers created by authors | [ | |
| Hand Gesture data (0–9) Created by the authors | [ | |
| Foot gesture and activity data Created by authors | [ | |
| Data | Virus dataset | [ |
| Sonar dataset | [ | |
| Peugeot 14 | [ | |
| Peugeot 15 | [ | |
| EnviroCar | [ | |
| Air Quality Index (AQI) | [ | |
| Dset-2.0 created by authors | [ | |
| Dset-1.5 created by authors | [ | |
| Dset-1.0 created by authors | [ | |
| Audio | Google Speech Commands (GSC) | [ |
Summary of dataset descriptions used in previous TinyML-based studies.
| Study | Dataset | Description | Total | Training Dataset | Testing Dataset |
|---|---|---|---|---|---|
| [ | Heart dataset | Heart dataset produced by the University of California Irvine (UCI), contains 13 features. In these, 0 represents an absence of coronary heart disease (CHD) in the patient and labels 1–4 represent the presence of CHD | 300 data | - | - |
| Virus dataset | Developed to be used in data traffic analysis | - | - | - | |
| Sonar dataset | Contains reading sonar system for two classes (Miners and Rocks) to materials analysis | 208 data | - | - | |
| Peugeot 14 | Contains different parameters from cars to predict road surface | 8615 data | - | - | |
| Peugeot 15 | Contains different parameters from cars to predict the traffic | 8615 data | - | - | |
| EnviroCar | Contains anonymized tracks of car measurements collected by citizen bus | Around 1.7 million data point | - | - | |
| AQI | Air Quality Index (AQI) dataset includes measurement of air quality for one year in Australia. | Real-time data from website | - | - | |
| [ | Handwritten Digits | Handwritten digit images from (0 to 9) | 70,000 images | 60,000 | 10,000 |
| [ | Sign MNIST dataset | Used 24 of the 26 letters of the alphabet in English, leaving out the letters J and Z | 34,627 images | 27,455 images | 7172 images |
| Kaggle ASL dataset | Used 3000 images per class, 24 classes for 26 letters of the alphabet in English, except for J and Z | 72,024 images | 72,000 images | 24 images | |
| Sign Language dataset | Used 400 images for each of the 24 classes, used inter_area interpolation OpenCV to downscale the images into 28 × 28 | 400 images | 40 images | 360 images | |
| Kaggle ASL Alphabet test set. | 30 images per 24 classes, used as generalization dataset or as final testset | 720 images | - | - | |
| [ | Face Mask 12 K Images Dataset | Faces images with/without a mask with a variety of backgrounds and cropped to face region | 58,960 images. | 58,960 images | - |
| Face Mask Classification dataset | Face images with/Without Mask | 22,200 images | 22,200 images | - | |
| Medical face OpenMV | Used the OpenMV Cam H7 camera to create a dataset. The size of images was 200 × 200 then saved on the SD Card of the development board | 49,895 images | 49,895 images | - | |
| Medical face testing dataset OpenMV Dataset. | Used OpenMV camera to create a dataset | 4794 images | 4794 images | - | |
| [ | Hand Gesture data from numbers (0–9) | Data for 10 numbers of gestures (from 0 to 9) | 1000 gestures | - | - |
| [ | Foot gestures and activity data | Set of data activities (walking, jogging, standing) and two gestures (double tap at the toe tip and double tap at heel). | 30,000 data | 24,000 data | 6000 data |
| [ | Hand gestures | A set of gestures were recorded such (Forward, Backward, Select and Abort) using fingers, arms and entire hands. | - | -RNN 15 data recorded then augmented to 540 | 24 signals |
| [ | Google Speech Commands (GSC) dataset. | 65,000 of 1 s verbal command for short words with background noise. | 65,000 | - | - |
| [ | Dset-2.0 | Dset-2.0 contains samples (clear images) with (2.0 ms) | - | 1000 sample | 300 sample |
| Dset-1.5 | Dset-1.5 contains samples with (low-contrast images) with (1.5 ms) | - | 1000 sample | 300 sample | |
| Dset-1.0 | Dset-1.0 contains samples (low-contrast images) with (1.0 ms) | - | 1000 sample | 300 sample | |
| [ | ImageNET: | the standard large-scale benchmark for image classification consists of Set of 1000 object categories containing internal and leaf nodes, but do not interfere with each other. | 10,000 images | - | - |
| Wake word: Visual Wake Word (VWW): | VWW: is a set of natural images of a complex day. Each image classifies to label 1 images present (Person) or 0 (Not Person) | 5000 images | - | - | |
| Wake word: Google Speech Commands (GSC) dataset | GSC: Speech Commands is an audio dataset for keyword spotting (e.g., “Hey Siri”), requiring classifying a spoken word from a vocabulary of size 35. | - | - | - | |
| [ | ImageNET Dataset | Set of 1000 object categories contains internal and leaf nodes. | 200,000 images | 50,000 images | 150,000 images |
| [ | VWW | VWW: is a set of natural images of a complex day. Each image classifies to label 1 images present (Person) or 0 (Not Person) | 115,387 | 115,287 images | 100 images |
| [ | 9 datasets (Flower, Cars, CUB, food, Pets, Aircraft, CIFAR10, CIFAR-100 and CelebA) | Used ImageNet at pre-train on eight object classification datasets (Flower, Cars, CUB, Food, Pets, Aircraft, CIFAR10 and CIFAR-100) | - | - | - |
| [ | Visual Wake Word (VWW) | Each image classifies to label images (1 present Person or 0 Not Person) | 115 k | 115 k images | 8 k |
Figure 5Machine and deep-learning models distributed in previous studies.
Figure 6TinyML devices distribution in previous studies.
Summary of TinyML devices used in different previous studies.
| Processor | Flash Memory | RAM | Processor Speed (MHz) |
|---|---|---|---|
| STM32-L476RG | 1 MB | 128 KB | 80 MHz |
| STM32-H743VI | 2 MB | 1 MbB | 480 MHz |
| STM32 Nucleo-64 F091RC | 256 KB | 32 KB | 48 (max: 48) |
| STM32 Nucleo-64 F303RE | 512 KB | 80 KB | 72 (max: 72) |
| STM32 Nucleo-64 F401RE | 512 KB | 96 KB | 84 (max: 84) |
| STM32 Nucleo-144 F746ZG | 1 MB | 340 KB | 96 (max: 216) |
| STM32 Nucleo-144 H743ZI2 | 2 MB | 1 MB | 96 (max: 480) |
| STM32 Nucleo-64 L452RE | 512 KB | 160 KB | 80 (max: 80) |
| STM32H747I-Disco_CPU (ARM Cortex M4+ ARM Cortex M7) | 1 MB | 2 MB | 240 MHz (M4) + 480 MHz (M7) |
| STM32H743VI | 2 MB | 1 MB | 400 MHz |
| GAP 8 based PULP architecture | 512 kB | 80 KB | 22.65 |
| NXP Semiconductors FRDM-K64F | 1 MB | 256 KB | 120 MHz |
| ATMEGA4809 | 48 KB | 6 KB | 20 MHz |
| Arm CPU Cortex-M4 | 0.38 MB | 1 MB | 96 MHz |
| Xtensa DSP HiFi Mini | 1 MB | 1 MB | 10 MHz |
| STM32H743 SoC- ARM Cortex- M7 | 2 MB | 512 KB | 480 MHz |
| Sparkfun Edge (Ambiq Apollo3), Arm CPU Cortex-M4 | 1 MB | 0.38 MB | 96 MHz |
| Tensilica HiFi, Xtensa DSP HiFi Mini processor | 1 MB | 1 MB | 10 MHz |
| ESP32 | 448 KB | 520 KiB SRAM | 160 MHz–240 MHz |
| Taiyo Yuden EYSHSNZWZ NRF52 | 512 KB | 64 KB | 2402 MHz–2480 MHz |
| OpenMV Cam H7—Processor (ARM Cortex M7 480 MHz) | 2 MB | 1 MB | 480 MHz |
| STM32H747I-Disco_CPU (ARM Cortex M4 + ARM Cortex M7) | 1 MB | 2 MB | 240 MHz (M4) + 480 MHz (M7) |
| STM32H743VI | 2 MB | 1 MB | 400 MHz |