| Literature DB >> 35408281 |
Carlos Poncinelli Filho1, Elias Marques1, Victor Chang2, Leonardo Dos Santos1, Flavia Bernardini1, Paulo F Pires1, Luiz Ochi1, Flavia C Delicato1.
Abstract
Distributed edge intelligence is a disruptive research area that enables the execution of machine learning and deep learning (ML/DL) algorithms close to where data are generated. Since edge devices are more limited and heterogeneous than typical cloud devices, many hindrances have to be overcome to fully extract the potential benefits of such an approach (such as data-in-motion analytics). In this paper, we investigate the challenges of running ML/DL on edge devices in a distributed way, paying special attention to how techniques are adapted or designed to execute on these restricted devices. The techniques under discussion pervade the processes of caching, training, inference, and offloading on edge devices. We also explore the benefits and drawbacks of these strategies.Entities:
Keywords: Internet of Things; artificial intelligence; distributed; edge intelligence; fog intelligence; machine learning
Mesh:
Year: 2022 PMID: 35408281 PMCID: PMC9002674 DOI: 10.3390/s22072665
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1A schematic overview of the organization (structure) of this paper.
Comparison of existing surveys.
| Scope | |||
|---|---|---|---|
| Paper | Challenges | Group | Different Application |
|
Al-Rakhami et al. [ | 0/6 | 2/8 | 1/6 |
| Wang et al. [ | 1/6 | 4/8 | 4/6 |
| Verbraeken et al. [ | 1/6 | 0/8 | 0/6 |
| Zhou et al. [ | 2/6 | 4/8 | 0/6 |
| Dianlei Xu et al. [ | 6/6 | 3/8 | 0/6 |
| Our work | 6/6 | 8/8 | 6/6 |
Research Questions (RQs).
| Research Questions (RQs) | Goals | |
|---|---|---|
| RQ1 | What are the main challenges and open issues in the distributed learning field? | To obtain an understanding of the main challenges and open issues in the distributed learning field. |
| RQ2 | What are the techniques and strategies currently used in distributed learning? | To characterize techniques and strategies used in distributed learning. |
| RQ3 | What are the frameworks currently used in distributed learning? | To characterize frameworks used in distributed learning. |
| RQ4 | What are the different application domains of edge intelligence? | To characterize the different application domains of edge intelligence. |
Criteria adopted to include papers in the study.
| Inclusion Criteria | |
|---|---|
| IC1 | The study presents or discusses opportunities or challenges to run ML at the edge |
| IC2 | The study presents or discusses applications of ML at the edge |
| IC3 | The study presents or discusses techniques, strategies and/or frameworks that enable ML to run at the edge of the network |
Criteria adopted to exclude papers from the study.
| Exclusion Criteria | |
|---|---|
| EC1 | The study is not related to Edge/Fog Computing |
| EC2 | The study is not related to distributed ML in Edge/Fog Computing |
| EC3 | The study is a previous version of a more complete study about the same research |
| EC4 | The study was not approved according to the relevance criteria |
Figure 2Number of papers excluded step by step in the SLR.
Challenges in distributed machine learning in edge computing.
| Challenges | |
|---|---|
| CH1 | Running ML/DL on devices with limited resources |
| CH2 | Ensuring energy efficiency without compromising the accuracy |
| CH3 | Communication efficiency |
| CH4 | Ensuring data privacy and security |
| CH5 | Handling failure in edge devices |
| CH6 | Heterogeneity and low quality of data |
References to the challenges of Edge Intelligence.
| References | Works That Tackle the Challenges | |
|---|---|---|
| CH1 | [ | [ |
| CH2 | [ | [ |
| CH3 | [ | [ |
| CH4 | [ | [ |
| CH5 | [ | – |
| CH6 | [ | [ |
Figure 3Model Partitioning.
Figure 4Model compression example.
References considering techniques and/or strategies of edge intelligence implementations.
| Techniques | Works |
|---|---|
| Federated Learning | [ |
| Model Partitioning | [ |
| Model Right-sizing | [ |
| Edge Pre-Processing | [ |
| Scheduling | [ |
| Cloud Pre-Training | [ |
| Edge Only | [ |
| Model Compression | [ |
EI frameworks.
| Framework | Groups of | Comments |
|---|---|---|
| Neurosurgeon [ | Model Partitioning | Lightweight scheduler to automatically partition DNN computation between edge devices and cloud at the granularity of NN layers |
| JointDNN [ | Model Partitioning | JointDNN provides an energy- and performance-efficient method of querying some layers on the mobile device and some layers on the cloud server. |
| H. Li et al. [ | Model Partitioning | They divide the NN layers and deploy the part with the lower ones (closer to the input) into edge servers and the part with higher layers (closer to the output) into the cloud for offloading processing. They also propose an offline and an online algorithm that schedules tasks in Edge servers. |
| Musical chair [ | Model Partitioning | Musical Chair aims at alleviating the compute cost and overcoming the resource barrier by distributing their computation: data parallelism and model parallelism. |
| AAIoT [ | Model Partitioning | Accurate segmenting NNs under multi-layer IoT architectures |
| MobileNet [ | Model Compression | Presented by Google Inc., the two hyperparameters introduced allow the model builder to choose the right sized model for the specific application. |
| Squeezenet | Model Compression | It is a reduced DNN that achieves AlexNet-level accuracy with 50 times fewer parameters |
| Tiny-YOLO | Model Compression | Tiny Yolo is a very lite NN and is hence suitable for running on edge devices. It has an accuracy that is comparable to the standard AlexNet for small class numbers but is much faster. |
| BranchyNet | Right sizing | Open source DNN training framework that supports the early-exit mechanism. |
| TeamNet [ | Model Compression | TeamNet trains shallower models using the similar but downsized architecture of a given SOTA (state of the art) deep model. The master node compares its uncertainty with the worker’s and selects the one with the least uncertainty as to the final result. |
| OpenEI [ | Model Compression | The algorithms are optimized by compressing the size of the model, quantizing the weight. The model selector will choose the most suitable model based on the developer’s requirement (the default is accuracy) and the current computing resource. |
| TensorFlow Lite [ | Data Quantization | TensorFlow’s lightweight solution, which is designed for mobile and edge devices. It leverages many optimization techniques, including quantized kernels, to reduce the latency. |
| QNNPACK (Quantized Neural Networks PACKage) [ | Data Quantization | Developed by Facebook, is a mobile-optimized library for high-performance NN inference. It provides an implementation of common NN operators on quantized 8-bit tensors. |
| ProtoNN [ | Model Compression | Inspired by k-Nearest Neighbor (KNN) and could be deployed on the edges with limited storage and computational power. |
| EMI-RNN [ | Right Sizing | It requires 72 times less computation than standard Long Short term Memory Networks (LSTM) and improves its accuracy by 1%. |
| CoreML [ | Model Compression | Published by Apple, it is a deep learning package optimized for on-device performance to minimize memory footprint and power consumption. Users are allowed to integrate the trained machine learning model into Apple products, such as Siri, Camera, and QuickType. |
| DroNet [ | Model Compression | The DroNet topology was inspired by residual networks and was reduced in size to minimize the bare image processing time (inference). The numerical representation of weights and activations reduces from the native one, 32-bit floating-point (Float32), down to a 16-bit fixed point one (Fixed16). |
| Stratum [ | Model Selector | Stratum can select the best model by evaluating a series of user-built models. A resource monitoring framework within Stratum keeps track of resource utilization and is responsible for triggering actions to elastically scale resources and migrate tasks, as needed, to meet the ML workflow’s Quality of Services (QoS). ML modules can be placed on the edge of the Cloud layer, depending on user requirements and capacity analysis. |
| Efficient distributed deep learning (EDDL) [ | Model Compression | A systematic and structured scheme based on balanced incomplete block design (BIBD) used in situations where the dataflows in DNNs are sparse. Vertical and horizontal model partition and grouped convolution techniques are used to reduce computation and memory. To speed up the inference, BranchyNet is utilized. |
| In-Edge AI [ | Federated Learning | Utilizes the collaboration among devices and edge nodes to exchange the learning parameters for better training and inference of the models. |
| Edgence [ | Blockchain | Edgence (EDGe + intelligENCE) is proposed to serve as a blockchain-enabled edge-computing platform to intelligently manage massive decentralized applications in IoT use cases. |
| FederatedAveraging (FedAvg) [ | Federated Learning | Combines local stochastic gradient descent (SGD) on each client with a server that performs model averaging. |
| SSGD [ | Federated Learning | System that enables multiple parties to jointly learn an accurate neural network model for a given objective without sharing their input datasets. |
| BlockFL [ | Blockchain | Mobile devices’ local model updates are exchanged and verified by leveraging blockchain. |
| Edgent [ | Model Partitioning | Adaptively partitions DNN computation between the device and edge, in order to leverage hybrid computation resources in proximity for real-time DNN inference. DNN right-sizing accelerates DNN inference through the early exit at a proper intermediate DNN layer to further reduce the computation latency. |
| PipeDream [ | Model Partitioning | PipeDream keeps all available GPUs productive by systematically partitioning DNN layers among them to balance work and minimize communication. |
| GoSGD [ | Gossip Averaging | Method to share information between different threads based on gossip algorithms and showing good consensus convergence properties. |
| Gossiping SGD [ | Gossip Averaging | Asynchronous method that replaces the all-reduce collective operation of synchronous training with a gossip aggregation algorithm. |
| GossipGraD [ | Gossip Averaging | Asynchronous communication of gradients for further reducing the communication cost. |
| INCEPTIONN [ | Data Quantization | Lossy-compression algorithm for floating-point gradients. The framework reduces the communication time by 70.9 80.7% and offers 2.2 3.1× speedup over the conventional training system while achieving the same level of accuracy. |
| Minerva [ | Data Quantization | Quantization analysis minimizes bit widths without exceeding a strict prediction error bound. Compared to a 16-bit fixed-point baseline, Minerva reduces power consumption by 1.5×. Minerva identifies operands that are close to zero and removes them from the prediction computation such that model accuracy is not affected. Selective pruning further reduces power consumption by 2.0× on top of bit width quantization. |
| AdaDeep [ | Model Compression | Automatically selects a combination of compression techniques for a given DNN that will lead to an optimal balance between user-specified performance goals and resource constraints. AdaDeep enables up to 9.8× latency reduction, 4.3× energy efficiency improvement, and 38× storage reduction in DNNs while incurring negligible accuracy loss. |
| JALAD [ | Data Quantization | Data compression by jointly considering compression rate and model accuracy. A latency-aware deep decoupling strategy to minimize the overall execution latency is employed. Decouples a deep NN to run a part of it at edge devices and the other part inside the conventional cloud. |
Figure 5Edge Intelligence strategies.
Figure 6EI application domains.
Application domains and corresponding works.
| Domains | Works That Approach the Theme |
|---|---|
| Industry (8) | [ |
| Surveillance (5) | [ |
| Security (4) | [ |
| Intelligent Transport Systems (ITS) (13) | [ |
| Health (14) | [ |
| Energy Management (4) | [ |
Figure 7Publications by domain application.