12
1 Edge Intelligence: the Confluence of Edge Computing and Artificial Intelligence Shuiguang Deng, Senior Member, IEEE, Hailiang Zhao, Jianwei Yin, Schahram Dustdar, Fellow, IEEE, and Albert Y. Zomaya, Fellow, IEEE Abstract—Along with the deepening development in commu- nication technologies and the surge of mobile devices, a brand- new computation paradigm, Edge Computing, is surging in pop- ularity. Meanwhile, Artificial Intelligence (AI) applications are thriving with the breakthroughs in deep learning and the upgrade of hardware architectures. Billions of bytes of data, generated at the network edge, put great demands on data processing and structural optimization. Therefore, there exists a strong demand to integrate Edge Computing and AI, which gives birth to Edge Intelligence. In this article, we divide Edge Intelligence into AI for edge (Intelligence-enabled Edge Computing) and AI on edge (Artificial Intelligence on Edge). The former focuses on providing a more optimal solution to the key concerns in Edge Computing with the help of popular and effective AI technologies while the latter studies how to carry out the entire process of building AI models, i.e., model training and inference, on edge. This article focuses on giving insights into this new inter-disciplinary field from a broader vision and perspective. It discusses the core concepts and the research road-map, which should provide the necessary background for potential future research programs in Edge Intelligence. Index Terms—Edge Intelligence, Edge Computing, Wireless Networking, Computation Offloading, Federated Learning. I. I NTRODUCTION C OMMUNICATION technologies are undergoing a new revolution, i.e., the advent of the 5th generation cel- lular wireless systems (5G). 5G brings enhanced mobile broadband (eMBB), Ultra-Reliable Low Latency Communica- tions (URLLC) and massive Machine Type Communications (mMTC). With the proliferation of the Internet of Things (IoTs), more and more data is created by widespread and geographically distributed mobile and IoT devices, instead of the mega-scale cloud datacenters [1]. Specifically, according to the prediction of Ericsson, 45% of the 40ZB global internet data will be generated by IoT devices in 2024 [2]. Offloading such huge data from the edge to cloud is intractable because it can lead to oppressive network congestion. Therefore, a more applicable way is handling user demands from edge directly, which leads to the birth of a brand-new computation paradigm, (Mobile Multi-access) Edge Computing [3]. The subject of Edge Computing spans many concepts and technologies in diverse disciplines, including Service-oriented Computing S. Deng, H. Zhao and J. Yin are with the College of Computer Science and Technology, Zhejiang University, 310058 Hangzhou, China. E-mail: {dengsg, hliangzhao, zjuyjw}@zju.edu.cn. S. Dustdar is with the Distributed Systems Group, Technische Universitt Wien, 1040 Vienna, Austria. E-mail: [email protected]. A. Y. Zomaya is with the School of Computer Science, University of Syd- ney, Sydney, NSW 2006, Australia. E-mail: [email protected]. (SOC), Software-defined Networking (SDN), Computer Ar- chitecture, etc. The principle of Edge Computing is to push the computation and communication resources from cloud to the edge of networks to provide services and perform computations, avoiding unnecessary communication latency and enabling faster responses for end users. Edge Computing is booming now. No one can deny that Artificial Intelligence (AI) is devel- oping unprecedentedly nowadays. Big data processing neces- sitates that more powerful methods, i.e., AI technologies, for extracting insights that lead to better decisions and strategic business moves. In the last decade, with the huge success of AlexNet, Deep Neural Networks (DNNs), which can learn the deep representation of data, have become the most popular machine learning architectures. Deep learning, represented by DNNs and their ramifications, i.e., Convolutional Neu- ral Networks (CNNs), Recurrent Neural Networks (RNNs) and Generative Adversarial Networks (GANs), is the most advanced AI technology. Deep learning has made striking breakthroughs in a wide spectrum of fields, including com- puter vision, speech recognition, natural language processing, and board games. Besides, the hardware architectures and platforms keep upgrading with a rapid rate, which makes it possible to satisfy the requirements of the computation- intensive deep learning models. Application-specific accelera- tors are designed for further improvement in throughput and energy efficiency. In conclusion, driven by the breakthroughs in deep learning and the upgrade of hardware architectures, AI is undergoing sustained prosperity and development. Considering that AI is functionally necessary for quickly analyzing huge volumes of data and extracting insights, there exists a strong demand to integrate Edge Computing and AI, which gives the birth of Edge Intelligence. Edge Intelligence is not the simple combination of Edge Computing and AI. The subject of Edge Intelligence is tremendous and enormously so- phisticated, covering many concepts and technologies, which are interwoven together in a complicated manner. Currently, the formal and internationally acknowledged definition of Edge Intelligence is non-existent. To deal with the problem, some researchers put forward their definitions. For example, Zhou et al. believe that the scope of Edge Intelligence should not be restricted to running AI models solely on the edge servers or devices but in the manner of the collaboration of edge and cloud [4]. They define six levels of Edge Intelligence, from cloud-edge co-inference (level 1) to all on-device (level 6). Zhang et al. define Edge Intelligence as the capability to enable edges to execute AI algorithms [5]. arXiv:1909.00560v1 [cs.NI] 2 Sep 2019

Edge Intelligence: the Confluence of Edge Computing and

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Edge Intelligence: the Confluence of Edge Computing and

1

Edge Intelligence: the Confluence ofEdge Computing and Artificial Intelligence

Shuiguang Deng, Senior Member, IEEE, Hailiang Zhao, Jianwei Yin, Schahram Dustdar, Fellow, IEEE,and Albert Y. Zomaya, Fellow, IEEE

Abstract—Along with the deepening development in commu-nication technologies and the surge of mobile devices, a brand-new computation paradigm, Edge Computing, is surging in pop-ularity. Meanwhile, Artificial Intelligence (AI) applications arethriving with the breakthroughs in deep learning and the upgradeof hardware architectures. Billions of bytes of data, generated atthe network edge, put great demands on data processing andstructural optimization. Therefore, there exists a strong demandto integrate Edge Computing and AI, which gives birth to EdgeIntelligence. In this article, we divide Edge Intelligence into AIfor edge (Intelligence-enabled Edge Computing) and AI on edge(Artificial Intelligence on Edge). The former focuses on providinga more optimal solution to the key concerns in Edge Computingwith the help of popular and effective AI technologies while thelatter studies how to carry out the entire process of building AImodels, i.e., model training and inference, on edge. This articlefocuses on giving insights into this new inter-disciplinary fieldfrom a broader vision and perspective. It discusses the coreconcepts and the research road-map, which should provide thenecessary background for potential future research programs inEdge Intelligence.

Index Terms—Edge Intelligence, Edge Computing, WirelessNetworking, Computation Offloading, Federated Learning.

I. INTRODUCTION

COMMUNICATION technologies are undergoing a newrevolution, i.e., the advent of the 5th generation cel-

lular wireless systems (5G). 5G brings enhanced mobilebroadband (eMBB), Ultra-Reliable Low Latency Communica-tions (URLLC) and massive Machine Type Communications(mMTC). With the proliferation of the Internet of Things(IoTs), more and more data is created by widespread andgeographically distributed mobile and IoT devices, instead ofthe mega-scale cloud datacenters [1]. Specifically, accordingto the prediction of Ericsson, 45% of the 40ZB global internetdata will be generated by IoT devices in 2024 [2]. Offloadingsuch huge data from the edge to cloud is intractable because itcan lead to oppressive network congestion. Therefore, a moreapplicable way is handling user demands from edge directly,which leads to the birth of a brand-new computation paradigm,(Mobile → Multi-access) Edge Computing [3]. The subjectof Edge Computing spans many concepts and technologiesin diverse disciplines, including Service-oriented Computing

S. Deng, H. Zhao and J. Yin are with the College of Computer Science andTechnology, Zhejiang University, 310058 Hangzhou, China. E-mail: {dengsg,hliangzhao, zjuyjw}@zju.edu.cn.

S. Dustdar is with the Distributed Systems Group, Technische UniversittWien, 1040 Vienna, Austria. E-mail: [email protected].

A. Y. Zomaya is with the School of Computer Science, University of Syd-ney, Sydney, NSW 2006, Australia. E-mail: [email protected].

(SOC), Software-defined Networking (SDN), Computer Ar-chitecture, etc. The principle of Edge Computing is to pushthe computation and communication resources from cloudto the edge of networks to provide services and performcomputations, avoiding unnecessary communication latencyand enabling faster responses for end users. Edge Computingis booming now.

No one can deny that Artificial Intelligence (AI) is devel-oping unprecedentedly nowadays. Big data processing neces-sitates that more powerful methods, i.e., AI technologies, forextracting insights that lead to better decisions and strategicbusiness moves. In the last decade, with the huge success ofAlexNet, Deep Neural Networks (DNNs), which can learn thedeep representation of data, have become the most popularmachine learning architectures. Deep learning, representedby DNNs and their ramifications, i.e., Convolutional Neu-ral Networks (CNNs), Recurrent Neural Networks (RNNs)and Generative Adversarial Networks (GANs), is the mostadvanced AI technology. Deep learning has made strikingbreakthroughs in a wide spectrum of fields, including com-puter vision, speech recognition, natural language processing,and board games. Besides, the hardware architectures andplatforms keep upgrading with a rapid rate, which makesit possible to satisfy the requirements of the computation-intensive deep learning models. Application-specific accelera-tors are designed for further improvement in throughput andenergy efficiency. In conclusion, driven by the breakthroughsin deep learning and the upgrade of hardware architectures,AI is undergoing sustained prosperity and development.

Considering that AI is functionally necessary for quicklyanalyzing huge volumes of data and extracting insights, thereexists a strong demand to integrate Edge Computing and AI,which gives the birth of Edge Intelligence. Edge Intelligenceis not the simple combination of Edge Computing and AI. Thesubject of Edge Intelligence is tremendous and enormously so-phisticated, covering many concepts and technologies, whichare interwoven together in a complicated manner. Currently,the formal and internationally acknowledged definition ofEdge Intelligence is non-existent. To deal with the problem,some researchers put forward their definitions. For example,Zhou et al. believe that the scope of Edge Intelligence shouldnot be restricted to running AI models solely on the edgeservers or devices but in the manner of the collaboration ofedge and cloud [4]. They define six levels of Edge Intelligence,from cloud-edge co-inference (level 1) to all on-device (level6). Zhang et al. define Edge Intelligence as the capability toenable edges to execute AI algorithms [5].

arX

iv:1

909.

0056

0v1

[cs

.NI]

2 S

ep 2

019

Page 2: Edge Intelligence: the Confluence of Edge Computing and

2

In this paper, we propose to establish a broader vision andperspective. We suggest to distinguish edge Intelligence intoAI for edge and AI on edge.

1) AI for edge is a research direction focusing on pro-viding a better solution to the constrained optimizationproblems in Edge Computing with the help of popularand effective AI technologies. Here, AI is used forenergizing edge with more intelligence and optimality.Therefore, it can be understood as Intelligence-enabledEdge Computing (IEC).

2) AI on edge studies how to carry out the entire processof AI models on edge. It is a paradigm of runningAI models training and inference with device-edge-cloud synergy, which aims at extracting insights frommassive and distributed edge data with the satisfactionof algorithm performance, cost, privacy, reliability, effi-ciency, etc. Therefore, it can be interpreted as ArtificialIntelligence on Edge (AIE).

Edge Intelligence, currently in its early stage, is attractingmore and more researchers and companies from all over theworld. To disseminate the recent advances of Edge Intel-ligence, Zhou et al. have conducted a comprehensive andconcrete survey of the recent research efforts on Edge In-telligence [4]. They survey the architectures, enabling tech-nologies, systems, and frameworks from the perspective ofAI models’ training and inference. However, the material inEdge Intelligence spans an immense and diverse spectrum ofliterature, in origin and in nature, which is not fully coveredby this survey. Many concepts are still unclear and questionsremain unsolved. The researching actuality motivates us towrite this article to provide possible enlightening insights withsimple and clear classification.

We commit ourselves to lucubrating Edge Intelligence ina broader vision and perspective. In Section II, we discussthe relation between Edge Computing and AI. In Section III,we demonstrate the research road-map of Edge Intelligenceconcisely with a hierarchical structure. Section IV and SectionV elaborate the state of the art and grand challenges on AI foredge and AI on edge, respectively. Section VI concludes thearticle.

II. THE RELATIONS BETWEEN EDGE COMPUTING AND AI

We believe that the confluence of AI and Edge Computingis natural and inevitable. In effect, there is an interactiverelationship between them. Edge Intelligence develops in theprocess of interaction and mutual promotion between EdgeComputing and AI. On one hand, AI provides Edge Comput-ing with technologies and methods, and Edge Computing isunleashing its potential at scale with AI; on the other hand,Edge Computing provides AI with scenarios and platforms,and AI broadly flourishes with Edge Computing.

AI provides Edge Computing with technologies andmethods. In general, Edge Computing is a distributed com-putation paradigm, where software-defined networks have tobe built to decentralize data and provide services with robust-ness and elasticity. Edge Computing faces resource allocationproblems in different layers, such as CPU cycle frequency,

access jurisdiction, radio-frequency, bandwidth, and so on. Asa result, it has great demands on various powerful optimiza-tion tools to enhance system efficiency. AI technologies arecompetent to take on the task. Essentially, AI models extractunconstrained optimization problems from real scenarios andthen find the asymptotically optimal solutions iteratively withStochastic Gradient Descent (SGD) methods. Either statisticallearning methods or deep learning methods can offer help andadvice for the edge. Besides, reinforcement learning, includingmulti-armed bandit theory, multi-agent learning and deep Q-network (DQN), is playing a more and more important rolein resource allocation problems for the edge.

Edge Computing provides AI with scenarios and plat-forms. The surge of IoT devices makes the Internet of Ev-erything (IoE) a reality [6]. More and more data is createdby widespread and geographically distributed mobile and IoTdevices, other than the mega-scale cloud datacenters. Manymore application scenarios, such as intelligent networked vehi-cles, autonomous driving, smart home, smart city and real-timedata processing in public security, can greatly facilitate therealization of AI from theory to practice. Besides, AI applica-tions with high communication quality and low computationalpower requirements can be migrated from cloud to edge. Ina word, Edge Computing provides AI with a heterogeneousplatform full of variety. Nowadays, it is gradually becomingpossible that AI chips with computational acceleration such asField Programmable Gate Arrays (FPGAs), Graphics Process-ing Units (GPUs), Tensor Processing Units (TPUs) and NeuralProcessing Units (NPUs) are integrated with intelligent mobiledevices. More and more corporations participate in the designof chip architectures to support the edge computation paradigmand engage in the DNN acceleration on resource-limited IoTdevices. The hardware upgrade on edge also injects vigor andvitality into AI.

III. RESEARCH ROAD-MAP OF EDGE INTELLIGENCE

The architectural layers in the Edge Intelligence road-map,depicted in Fig. 1, describe a logical separation for the twodirections respectively, i.e., AI for edge (left) and AI on edge(right). By the bottom-up approach, we divide research effortsin Edge Computing into Topology, Content, and Service. AItechnologies can be utilized in all of them. By top-downdecomposition, we divide the research efforts in AI on edgeinto Model Adaptation, Framework Design and ProcessorAcceleration. Before discussing AI for edge and AI on edgeseparately, we first describe the goal to be optimized for bothof them, which is collectively known as Quality of Experience(QoE). QoE stays in the top of the road-map.

A. Quality of Experience

We believe that QoE should be application-dependent anddetermined by jointly considering multi-criteria from Perfor-mance, Cost, Privacy (Security), Efficiency and Reliability.

1) Performance. Ingredients of performance are differentfor AI for edge and AI on edge. As for the former, perfor-mance indicators are problem-dependent. For example,performance could be the ratio of successfully offloading

Page 3: Edge Intelligence: the Confluence of Edge Computing and

3

PerformanceProblem-based Indicators

Training loss + Test Accuracy

Quality of Experience (QoE)

Cost

Computation Resource (delay)

Communicational Resource (latency)

Energy Consumption

Privacy (Security)

Efficiency

Reliability

AI for Edge

Service

Computation Offloading

User Profile Migration

Mobility Management

Content

Data Provisioning

Service

Provisioning

Placement

Composition

Caching

Topology

Edge Site Orchestration

Wireless

Networking

Data Acquisition

Network Planning

the

bottom-up

approach

AI on Edge

Model

Adaptation

Model Compression

Conditional Computation

Algorithm Asynchronization

Framework

DesignPartitioning

Splitting

Processor

Acceleration

Instrcution Set Design

the

top-down

decomposition

Thoroughly Decentralization

Model

Inference

Model

Training

Federated Learning

Parallel Computation

Near-data Processing

Knowledge Distillation

Fig. 1. The research road-map of Edge Intelligence.

when it comes into the computation offloading problems.It could be the service providers’ need-to-be-maximizedrevenue and need-to-be-minimized hiring costs of BaseStations (BSs) when it comes into the service placementproblems. As for the latter, performance is mainly con-stitutive of training loss and inference accuracy, whichare the most important criteria for AI models. Althoughthe computation scenarios have changed from cloudclusters to the synergetic system of device, edge, andcloud, these criteria still play important roles.

2) Cost. Cost usually consists of computation cost, com-munication cost, and energy consumption. Computationcost reflects the demand for computing resources suchas achieved CPU cycle frequency, allocated CPU timewhile communication cost presents the request for com-munication resources such as power, frequency band andaccess time. Many works also focused on minimizingthe delay (latency) caused by allocated computationand communication resources. Energy consumption isnot unique to Edge Computing but more crucial dueto the limited battery capacity of mobile devices. Costreduction is crucial because Edge Computing promisesa dramatic reduction in delay and energy consumptionby tackling the key challenges for realizing 5G.

3) Privacy (Security). With the increased awareness of the

leaks of public data, privacy preservation has becomeone of the hottest topics in recent years. The status quoled to the birth of Federated Learning, which aggregateslocal machine learning models from distributed deviceswhile preventing data leakage [7]. The security is closelytied with privacy preservation. It also has an associationwith the robustness of middleware and software of edgesystems, which are not considered in this article.

4) Efficiency. Whatever AI for edge or AI on edge, highefficiency promises us a system with excellent perfor-mance and low overhead. The pursuit of efficiency isthe key factor for improving existing algorithms andmodels, especially for AI on edge. Many approachessuch as model compression, conditional computation,and algorithm asynchronization are proposed to improvethe efficiency of training and inference of deep AImodels.

5) Reliability. System reliability ensures that Edge Com-puting will not fail throughout any prescribed operatingperiods. It is an important indicator of user experience.For Edge Intelligence, system reliability appears to beparticularly important for AI on edge because the modeltraining and inference are usually carried out in adecentralized and synchronized way and the participatedlocal users have a significant probability of failing to

Page 4: Edge Intelligence: the Confluence of Edge Computing and

4

complete the model upload and download due to thewireless network congestion.

B. A Recapitulation of IEC

The left of the road-map, depicted in Fig. 1, is AI for edge.We name this kind of work IEC (i.e. Intelligence-enabled EdgeComputing) as AI provides powerful tools for solving complexlearning, planning, and decision-making problems. By thebottom-up approach, the key concerns in Edge Computingare categorized into three layers, i.e., Topology, Content, andService.

For Topology, we pay close attention to the Orchestrationof Edge Sites (OES) and Wireless Networking (WN). In thisarticle, we define an edge site as a micro data center withapplications deployed, attached to a Small-cell Base Station(SBS). OES studies where to deploy and install wirelesstelecom equipment and servers. In recent years, researchefforts on the management and automation of UnmannedAerial Vehicles (UAVs) come into vogue [8] [9]. UAVswith a small server and an access point can be regarded asmoving edge servers with strong maneuverability. Therefore,many works explore the scheduling and trajectory problemswith the minimization of energy consumption of UAVs. WNstudies Data Acquisition and Network Planning. The formerconcentrates on the fast acquisition from rich but highlydistributed data at subscribed edge devices while the latterconcentrates on network scheduling, operating and managing.Fast data acquisition includes multiple access, radio resourceallocation, and signal encoding/decoding. Network planningstudies efficient management with protocols and middlewares.In recent years, there has been an increasing trend in intelligentnetworking. It builds an intelligent wireless communicationmechanism by popular AI technologies. For example, Zhuet al. propose Learning-driven Communication, whose mainprinciple is exploiting the coupling between communicationand learning in edge learning systems [10].

For Content, we place emphasis on Data Provisioning,Service Provisioning, Service Placement, Service Compositionand Service Caching. For data and service provisioning, theavailable resources can be provided by remote cloud data-centers and edge servers. In recent years, there exist researchefforts on constructing lightweight QoS-aware service-basedframeworks [11] [12]. The shared resources can also comefrom mobile devices if a proper incentive mechanism isemployed. Service placement is an important complement toservice provisioning, which studies where and how to deploycomplex services on possible edge sites. In recent years,many works study service placement from the perspectiveof Application Service Providers (ASPs). For example, [13]trys deploying services under the limited budget on basiccommunication and computation infrastructures. After that,multi-armed bandit theory, an embranchment of reinforcementlearning, was adopted to optimize the service placement de-cision. Service composition studies how to select candidateservices for composition in terms of energy consumption andQuality of Experience (QoS) of mobile end users [14] [15][16]. It opens research opportunities where AI technologies

can be utilized to generate better service selection schemes.Service caching can also be viewed as a complement to serviceprovisioning. It studies how to design a caching pool to storethe frequently visited data and services. Service caching canalso be studied in a cooperative way [17]. It opens researchopportunities where multi-agent learning can be utilized tooptimize QoE in large-scale edge computing systems.

For Service, we focus on Computation Offloading, UserProfile Migration, and Mobility Management. Computationoffloading studies the load balancing of various computationaland communication resources in the manner of edge serverselection and frequency spectrum allocation. More and moreresearch efforts focus on dynamically managing the radio andcomputational resources for multi-user multi-server edge com-puting systems, utilizing Lyapunov optimization technologies[18] [19]. In recent years, optimizing computation offloadingdecisions via DQN is popular [20] [21]. It models the compu-tation offloading problem as a Markov decision process (MBP)and maximize the long-term utility performance. The utilitycan be composed of the above QoE indicators and evolvesaccording to the iterative Bellman equation. After that, theasymptotically optimal computation offloading decisions areachieved based on Deep Q-Network. User profile migrationstudies how to adjust the place of user profiles (configurationfiles, private data, logs, etc) when the mobile users are inconstant motion. User profile migration is often associatedwith mobility management [22]. In [23], the proposed JCORMalgorithm jointly optimizes computation offloading and migra-tion by formulating cooperative networks. It opens researchopportunities where more advanced AI technologies can beutilized to increase optimality. Many existing research effortsstudy mobility management from the perspective of statisticsand probability theory. It has strong interests in realizingmobility management with AI.

C. A Recapitulation of AIE

The right of the road-map is AI on edge. We name this kindof work AIE (i.e. Artificial Intelligence on Edge) since it stud-ies how to carry out the training and inference of AI modelson the network edge. By top-down decomposition, we dividethe research efforts in AI on edge into three categories: ModelAdaptation, Framework Design and Processor Acceleration.Considering that the research efforts in Model Adaptation arebased on existing training and inference frameworks, let usintroduce Framework Design in the first place.

1) Framework Design: Framework Design aims at pro-viding a better training and inference architecture for theedge without modifying the existing AI models. Researchersattempt to design new frameworks for both Model Trainingand Model Inference.

For Model Training: To the best of our knowledge, formodel training, all proposed frameworks are distributed, ex-cept those knowledge distillation-based ones. The distributedtraining frameworks can be divided into data split and modelsplit [24]. Data split can be further divided into master-device split, helper-device split and device-device split. Thedifferences lie where the training samples come from and

Page 5: Edge Intelligence: the Confluence of Edge Computing and

5

how the global model assembled and aggregated. Modelsplit separates neural networks’ layers and deploys them ondifferent devices. It highly relies on sophisticated pipelines.Knowledge distillation-based frameworks may or may not bedecentralized, and they rely on transfer learning technologies[25]. Knowledge distillation can enhance the accuracy ofshallow student networks. It first trains a basic network on abasic dataset. After that, the learned features can be transferredto student networks to be trained on their datasets, respectively.The basic network can be trained on cloud or edge server whilethose student networks can be trained by numerous mobileend devices with their private data, respectively. We believethat there exist great avenues to be explored in knowledgedistillation-based frameworks for model training on the edge.The most popular work in model training is Federated Learn-ing [7]. Federated Learning is proposed to preserve privacywhen training the DNNs in a decentralized way. Withoutaggregating user private data to a central datacenter, FederatedLearning trains a series of local models on multiple clients.After that, a global model is optimized by averaging thetrained gradients of each client. We are not going to elaborateFederated Learning thoroughly. For more details please referto [7].

For Model Inference: Although model splitting is hardto realize for model training, it is a popular approach formodel inference. To the best of our knowledge, model split-ting/partitioning is the only approach that can be viewedas a framework for model inference. Other approaches suchas model compression, input filtering, early-exit and so oncan only be viewed as adaptations from existing frameworks,which will be introduced in the next paragraph and elaboratedon carefully in Subsection V-A. A typical example on modelinference on edge is [26], where a DNN is split into two partsand carried out collaboratively. The computation-intensive partis running on the edge server while the other is running on themobile device. The problem lies in where to split the layersand when to exit the intricate DNN according to the constrainton inference accuracy.

2) Model Adaptation: Model Adaptation makes appropri-ate improvements based on existing training and inferenceframeworks, usually Federated Learning, to make them moreapplicable to the edge. Federated Learning has potential tobe performed on the edge. However, the vanilla version ofFederated Learning has a strong demand for communicationefficiency since full local models are supposed to be sent backto the central server. Therefore, many researchers exploit moreefficient model updates and aggregation policies. Many worksare devoted to reducing cost and increasing robustness whileguaranteeing the system performance. Methods to realizemodel adaptation include but not limited to Model Compres-sion, Conditional Computation, Algorithm Asynchronizationand Thoroughly Decentralization. Model compression exploitsthe inherent sparsity structure of gradients and weights. Pos-sible approaches include but not limited to Quantization, Di-mensional Reduction, Pruning, Precision Downgrading, Com-ponents Sharing, Cutoff and so on. Those approaches can berealized by technologies such as Singular Value Decomposi-tion (SVD), Huffman Coding, Principal Component Analysis

(PCA) and some other acrobatics. Conditional computationis an alternative way to reduce the amount of calculationby selectively turning off some unimportant calculations ofDNNs. Possible approaches include but not limited to Com-ponents Shutoff, Input Filtering, Early Exit, Results Cachingand so on. Conditional Computation can be viewed as block-wise dropout [27]. Besides, random gossip communicationcan be utilized to reduce unnecessary calculations and modelupdates. Algorithm Asynchronization trys aggregating localmodels in an asynchronous way. It is designed for overcomingthe inefficient and lengthy synchronous steps of model updatesin Federated Learning. Thoroughly decentralization removesthe central aggregator to avoid any possible leakage andaddress the central server’s malfunction. The ways to achievetotally decentralization include but not limited to blockchaintechnologies and game-theoretical approaches.

3) Processor Acceleration: Processor Acceleration focuseson structure optimization of DNNs in that the frequently-used computation-intensive multiply-and-accumulate opera-tions can be improved. The approaches to accelerate DNNcomputation on hardware include (1) designing special in-struction sets for DNN training and inference, (2) designinghighly paralleled computing paradigms, (3) moving computa-tion closer to memory (near-data processing), etc. The highlyparalleled computing paradigms can be divided into temporaland spatial architectures [28]. The former architectures suchas CPUs and GPUs can be accelerated by reducing thenumber of multiplications and increasing throughputs. Thelatter architectures can be accelerated by increasing data reusewith data flows. For example, [29] proposes an algorithm toaccelerate CNN inference. The proposed algorithm converts aset of pre-trained weights into values under given precision. Italso puts near-data processing into practice with an adaptiveimplementation of memristor crossbar arrays. In the researcharea of Edge Computing, a lot of works hammer at theco-design of Model Adaptation and Processor Acceleration.Considering that Processor Acceleration is mainly investigatedby AI researchers, this article will not launch a carefuldiscussion on it. More details on hardware acceleration forDNN processing can be consulted in [28].

IV. AI FOR EDGE

In Subsection III-B, we divide the key concerns in EdgeComputing into three categories: Topology, Content and Ser-vice. It just presents a classification and possible researchdirections but does not provide in-depth analysis on howto apply AI technologies to edge to generate more optimalsolutions. This Section will remedy it. Fig. 2 gives an exampleof how AI technologies are utilized in the Mobile EdgeComputing (MEC) environment. Firstly, we need to identifythe problem to be studied. Take performance optimization asan example, the optimization goal, decision variables, andpotential constraints need to be confirmed. After that, themathematical model should be constructed. At last, we shoulddesign an algorithm to solve the problem. In fact, the modelconstruction is not only decided by the to-be-studied problem,but also the to-be-applied optimization algorithms. Take DQN

Page 6: Edge Intelligence: the Confluence of Edge Computing and

6

Problem Definition Model Construction Algorithm Design

Refactor

Performance Optimization in MEC Deep Q-Network (DQN)

execution delay

handover delay

task dropping cost

Binary

Task

energy allocation

edge server selection

Partial

Task

offloading or not

partition point

battery energy level

task execution deadline

radio frequency bandwidth

computing resources

Observe States

Observe Actions

(Discretization)

energy state

task request state

resource usage

...

edge server selection

offloading decision

Remove Constraints

add penalty

transfer to goal

add assumption

...

Memory Pool

(Database of Samples)

DNN

gradients policy

Mini-

batch

EnvironmentState

Cost

Action

Weight

Updating(alternative)

Need-to-be-minimized delay

Goal

Decision Variables

Constraints

Action

energy allocation

Fig. 2. The utilization of AI technology for performance optimization.

for example, we have to model the problem as a MDP withfinite states and actions. Thus, the constraints can not exist inthe long-term optimization problem. The most common wayis transferring those constraints into penalty and adding thepenalty to the optimization goal.

Considering that current research efforts on AI for edge con-centrate on Wireless Networking, Service Placement, ServiceCaching and Computation Offloading, we only focus on thesetopics in the following Subsection. For research directionsthat haven’t been touched and uncharted yet, we are lookingforward to more exciting works.

A. State of the Art1) Wireless Networking: 5G promises eMBB, URLLC and

mMTC in a real-time and highly dynamic environment. Underthe circumstances, researchers reach a consensus on that AItechnologies should and can be integrated across the wirelessinfrastructure and mobile users [30]. We believe that AI hasevery right to be synergistically applied to realize intelligentnetwork optimization in a fully online manner. One of thetypical works is [10]. This paper advocates a new set ofdesign principles for wireless communication on edge withmachine learning technologies and models embedded, whichare collectively named as Learning-driven Communication. Itcan be achieved across the whole process of data acquisition,which are in turn multiple access, radio resource managementand signal encoding.

For learning-driven multiple access, it advocates thatthe unique characteristics of wireless channels should be ex-ploited for functional computation. Over-the-air computation(AirComp) is a typical technique to realize it [31] [32]. [33]puts this principle into practice based on Broadband AnalogAggregation (BAA). Concretely, [33] suggests that the simul-taneously transmitted model updates in Federated Learningshould be analog aggregated by exploiting the waveform-superposition property of multi-access channels. The proposed

BAA can dramatically reduce communication latency com-pared with traditional Orthogonal Frequency Division Mul-tiple Access (OFDMA). [34] also explores the over-the-aircomputation for model aggregation in Federated Learning.Concretely, [34] puts the principle into practice by modelingthe device selection and beamforming design as a sparse andlow-rank optimization problem, which is highly intractablycombinatorial. To solve the problem with fast convergencerate, this paper proposed a difference-of-convex-functions(DC) representation via successive convex relaxation. The nu-merical results show that the proposed algorithm can achievelower training loss and higher inference accuracy comparedwith state-of-the-art approaches. This contribution can alsobe categorized as Model Adaptation in AI on edge, but itaccelerates Federated Learning from the perspective of fastdata acquisition.

For learning-driven radio resource management, itadvocates that radio resources should be allocated basedon the value of transmitted data, not just the efficiencyof spectrum utilization. Therefore, it can be understood asimportance-aware resource allocation and an obvious ap-proach is importance-aware retransmission. [35] puts theprinciple into practice. This paper proposed a retransmissionprotocol, named importance-aware automatic-repeat-request(importance ARQ). Importance ARQ makes the trade-off be-tween signal-to-noise ratio (SNR) and data uncertainty underthe desired learning accuracy. It can achieve fast convergencewhile avoiding learning performance degradation caused bychannel noise.

For learning-driven signal encoding, it advocates thatthe signal encoding should be designed by jointly optimizingfeature extraction, source coding, and channel encoding. Awork puts this principle into practice is [36], which proposes aHybrid Federated Distillation (HFD) scheme based on separatesource-channel coding and over-the-air computing. It adoptssparse binary compression with error accumulation in source-channel coding. For both digital and analog implementationsover Gaussian multiple-access channels, HFD can outperformthe vanilla version of Federated Learning in a poor communi-cation environment. This principle has something in commonwith Dimensional Reduction and Quantization from ModelAdaptation in AI on edge, but it reduces the feature sizefrom the source of data transmission. It opens great researchopportunities on the co-design of learning frameworks and dataencoding.

Apart from Learning-driven Communication, some workscontribute to AI for Wireless Networking from the perspectiveof power and energy consumption management. Shen et al.utilizes Graph Neural Networks (GNNs) to develop scalablemethods for power control in K-user interference channels[37]. This paper first models the K-user interference channelas a complete graph, then it learns the optimal power controlwith a graph convolutional neural network. [38] studies anenergy minimization problem where the baseband processes ofthe virtual small cells powered solely by energy harvesters andbatteries can be opportunistically executed in a grid-connectededge server. Based on multi-agent learning, several distributedfuzzy Q-learning-based algorithms are tailored. This paper can

Page 7: Edge Intelligence: the Confluence of Edge Computing and

7

be viewed as an attempt on coordination with broadcasting.As we will expound later, Wireless Networking is often

combined with Computation Offloading when it is studied inthe form of optimization. State of the art of these works islisted in Subsection IV-A3.

2) Service Placement and Caching: Many researchersstudy service placement from the perspective of ApplicationService Providers (ASPs). They model the data and service(it could be composed and complicated) placement prob-lem as a Markov Decision Process (MDP) and utilize AItechnologies such as reinforcement learning to achieve theoptimal placement decision. Typical work is [39]. This paperproposes a spatial-temporal algorithm based on Multi-armedbandit (MAB) and achieves the optimal placement decisionswhile learning the benefit. Concretely, it studies how manySBSs should be rent for edge service hosting to maximizethe expected utility up to a finite time horizon. The expectedutility is composed of delay reduction of all mobile users.After that, a MAB-based algorithm, named SEEN, is proposedto learn the local users service demand patterns of SBSs. Itcan achieve the balance between exploitation and explorationautomatically according to the fact that whether the set ofSBSs is chosen before. Another work attempts to integrateAI technologies with service placement is [40]. This workjointly decides which SBS to deploy each data block andservice component and how much harvested energy should bestored in mobile devices with a DQN-based algorithm. Thisarticle will not elaborate DQN thoroughly. More details canbe consulted in [41].

Service caching can be viewed as a complement to serviceplacement. Edge servers can be equipped with special servicecache to satisfy user demands on popular contents. A widerange of optimization problems on service caching are pro-posed to endow edge servers with learning capability. Sadeghiet al. study a sequential fetch-cache decision based on dynamicprices and user requests [17]. This paper endows SBSs withefficient fetch-cache decision-making schemes operating indynamic settings. Concretely, it formulates a cost minimizationproblem with service popularity considered. For the long-term stochastic optimization problem, several computationallyefficient algorithms are developed based on Q-learning.

3) Computation Offloading: Computation offloading couldbe the hottest topic when it comes to AI for edge. It studiesthe transfer of resource-intensive computational tasks fromresource-limited mobile devices to edge or cloud. This processinvolves the allocation of many resources, ranging from CPUcycles to channel bandwidth. Therefore, AI technologies withstrong optimization abilities have been extensively used inrecent years. Among all these AI technologies, Q-learningand its derivate, DQN, are in the spotlight. For example,[42] designs a Q-learning-based algorithm for computation of-floading. Concretely, it formulates the computation offloadingproblem as a non-cooperative game in multi-user multi-serveredge computing systems and proves that Nash Equilibriumexists. Then, this paper proposes a model-free Q-learning-based offloading mechanism which helps mobile devices learntheir long-term offloading strategies to maximize their long-term utilities.

More works are based on DQN because the curse ofdimensionality could be overcome with non-linear functionapproximation. For example, [20] studies the computation of-floading for IoT devices with energy harvesting in multi-serverMEC systems. The need-to-be-maximized utility formed fromoverall data sharing gains, task dropping penalty, energy con-sumption and computation delay, which is updated accordingto the Bellman equation. After that, DQN is used to generatethe optimal offloading scheme. In [21] [43], the computationoffloading problem is formulated as a MDP with finite statesand actions. The state set is composed of the channel qualities,the energy queue, and the task queue while the action set iscomposed of offloading decisions in different time slots. Then,a DQN-based algorithm is proposed to minimize the long-termcost. Based on DQN, [44] [45] jointly optimize task offloadingdecisions and wireless resource allocation to maximize thedata acquisition and analysis capability of the network. [46]studies the knowledge-driven service offloading problem forVehicle of Internet. The problem is also formulated as along-term planning optimization problem and solved basedon DQN. In summary, computation offloading problems invarious industrial scenarios have been extensively studied fromall sort of perspectives.

There also exist works who explore the task offloadingproblem with other AI technologies. For example, [47] pro-poses a long-short-term memory (LSTM) network to predictthe task popularity and then formulates a joint optimizationof the task offloading decisions, computation resource allo-cation and caching decisions. After that, a Bayesian learningautomata-based multi-agent learning algorithm is proposed foroptimality.

B. Grand Challenges

Although it is popular to apply AI technologies to edge forthe generation of better solutions, there have many challengesto face. In the next several Subsections, we list grand chal-lenges across the whole process of AI for edge research, whichin turn are model establishment, algorithm deployment and thebalance between optimality and efficiency. These challengesare closely related but each has its emphasis.

1) Model Establishment: If we want to use AI technologies,the built mathematical model has to be limited and the formu-lated optimization problem can not be unrestrained. On onehand, this is because the optimization basis of AI technologies,SGD (Stochastic Gradient Descent) and MBGD (Mini-BatchGradient Descent) methods, may not work well if the originalsearching space is constrained. On the other hand, especiallyfor MDPs, the state set and action set can not be infinite, anddiscretization is necessary to avoid the curse of dimensionalitybefore further processing. The common solution is changingthe constraints into a penalty and incorporating them intothe global optimization goal. The status quo greatly restrictsthe establishment of mathematical models and leads to theperformance downgrade. It can be viewed as a compromise forthe utilization of AI technologies. Therefore, how to establishappropriate system model faces great challenges.

Page 8: Edge Intelligence: the Confluence of Edge Computing and

8

2) Algorithm Deployment: The state-of-the-art works oftenformulate a combinatorial and NP-hard optimization problemwhich has fairly high computational complexity. Very fewworks can achieve an analytic approximate optimal solutionwith convex optimization technologies and acrobatics. Ac-tually, for AI for edge, the achieved solution mostly comesfrom iterative learning-based approaches. The status quo facegreat challenges when they are deployed on the edge in anonline manner. Besides, another ignored challenge is whichedge device should undertake the responsibility for deployingand running the proposed complicated algorithms. The existingresearch efforts usually concentrate on their specific problemsand do not provide the details on that.

3) Balance between Optimality and Efficiency: AlthoughAI technologies can indeed provide solutions with optimality,the trade-off between optimality and efficiency can not beignored when it comes to the resource-constrained edge.Thus, how to improve the usability and efficiency of edgecomputing system for different application scenarios with AItechnologies embedded is a severe challenge. The trade-offbetween optimality and efficiency should be realized basedon the characteristics of dynamically changing requirementson QoE and the network resource structure. Therefore, it iscoupling with the service subscribers’ pursuing superiority andthe utilization of available resources.

V. AI ON EDGE

In Subsection III-C, we divide the research efforts inAI on edge into Model Adaptation, Framework Design andProcessor Acceleration. The existing frameworks for modeltraining and inference are rare. The training frameworksinclude Federated Learning and Knowledge Distillation whilethe inference frameworks include Model Spitting and ModelPartitioning. AI models on edge are far limited to cloud-based predictions because of the relatively limited computeand storage abilities. How to carry out the model training andinference on resource-scarce devices is a serious issue. As aresult, compared with designing new frameworks, researchersin Edge Computing are more interested in improving existingframeworks to make them more appropriate to edge, usuallyreducing resource occupation. As a result, Model Adaptationbased on Federated Learning is prosperously developed. Aswe have mentioned before, Processor Acceleration will notbe elaborated in details. Therefore, we only focus on ModelAdaptation in the following Subsection. The state of the artis only discussed across Model Compression, ConditionalComputation, Algorithm Asynchronization, and ThoroughlyDecentralization.

A. State of the Art

1) Model Compression: As demonstrated in Fig. 3, theapproaches for Model Compression include Quantization, Di-mensionality Reduction, Pruning, Components Sharing, Pre-cision Downgrading and so on. They exploit the inherentsparsity structure of gradients and weights to reduce thememory and channel occupation as much as possible. Thetechnologies to compress and quantize weights include but

Methods Approaches Technologies

Model Adaptation

Model Compression

Conditional Computation

Algorithm

Asynchronization

Thoroughly

Decentralization

cost

efficiency

performance

privacy

cost

efficiency

efficiency

performance

Quantizing

Dimensional

Reduction

Pruning

Precision

Downgrading

Components

Sharing

Components

Shutoff

Input

Filtering

Early Exit

Results

Caching

Enhancement

Singular Value

Decomposition

Huffman

Coding

...

...

Block-wise

Dropout

...

Smart Contract

...

Participator

Selection

Game Theory

Exploit the inherent sparsity

structure of gradients and weights

Selectively turn off some

unimportant calculations

Aggregate local models in an

asynchronous way

Remove the central aggregator to

avoid any possible leakage

...

Random Gossip

Communication

Fig. 3. Methods, approaches and technologies of Model Adaptation.

not limited to Singular Value Decomposition (SVD), Huffmancoding and Principal Component Analysis. This article willnot launch a thorough introduction to them due to the limitedspace. Considering that many works simultaneously utilize theapproaches mentioned above, we do not further divide thestate of the art in Model Compression. One more thing shouldbe clearly noted is that Model Compression is suitable forboth Model Training and Model Inference. Thus we do notdeliberately distinguish them.

As we have mentioned before, communication efficiency isof the utmost importance for Federated Learning. Therefore,minimizing the number of rounds of communication is theprincipal goal when we move Federated Learning to theedge. A lot of works hammer at reducing the communicationcost for Federated Learning from various perspectives. In[48], structured updates and sketched updates are proposedfor reducing the uplink communication costs. For structuredupdates, the local update is learnt from a restricted lower-dimensional space; for sketched updates, the uploading modelis compressed before sending to the central server. [49] designsa communication-efficient secure aggregation protocol forhigh-dimensional data. The protocol can tolerate up to 33.3%of participated devices failing to complete the protocol, i.e.,the system is robust to the dropping out of participated users.[50] believes that DNNs are typically over-parameterized andtheir weights have significant redundancy. Meanwhile, pruningcompensate with the lost performance. Thus, this paper pro-poses a retraining-after-pruning scheme. It retrains the DNNon new data while the pruned weights stay constant. Thescheme can reduce the resource occupation while guaranteeinglearning accuracy. [51] exploits mixed low-bitwidth compres-sion. It hammers at determining the minimum bit precisionof each activation and weight under the given constraintson memory. [52] uses Binarized Neural Networks (BNNs),which have binary weights and activations to replace regularDNNs. This is a typical exploration of quantization. Analo-

Page 9: Edge Intelligence: the Confluence of Edge Computing and

9

gously, [53] proposes hybrid network architectures combingbinary and full-precision sections to achieve significant en-ergy efficiency and memory compression with performanceguaranteed. Thakker et al. study a compressed RNN cellimplementation called Hybrid Matrix Decomposition (HMD)for model inference [54]. It divides the matrix of networkweights into two parts: an unconstrained upper half and alower half composed of rank-1 blocks. The output features arecomposed of the rich part (upper) and the barren part (lower).This is an imaginative acrobatic on compression, comparedwith traditional pruning or quantization. The numerical resultsshow that it can not only achieve a faster run-time thanpruning and but also retain more model accuracy than matrixfactorization.

Some works also explore model compression based onpartitioned DNNs. For example, [55] proposes an auto-tuningneural network quantization framework for collaborative in-ference between edge and cloud. Firstly, DNN is partitioned.The first part is quantized and executed on the edge deviceswhile the second part is executed in cloud with full-precision.[56] proposes a framework to accelerate and compress modeltraining and inference. It partitions DNNs into multiple sec-tions according to their depth and constructs classifiers uponthe intermediate features of different sections. Besides, theaccuracy of classifiers is enhanced by knowledge distillation.

Apart from Federated Learning, there exist works probe intothe execution of statistical learning models or other populardeep models such as ResNet and VGG architectures onresource-limited end devices. For example, [57] proposes Pro-toNN, a compressed and accurate k-Nearest Neighbor (kNN)algorithm. ProtoNN learns a small number of prototypes torepresent the entire training set by Stochastic NeighborhoodCompression (SNC) [58], and then projects the entire datain a lower dimension with a sparse projection matrix. Itjointly optimizes the projection and prototypes with explicitmodel size constraint. Chakraborty et al. proposes Hybrid-Netwhich has both binary and high-precision layers to reduce thedegradation of learning performance [59]. Innovatively, thispaper leverage PCA to identify significant layers in a binarynetwork, other than dimensionality reduction. The significancehere is identified based on the ability of a layer to expand intohigher dimensional space.

Model Compression is currently the hottest direction in AIon edge because it is easy to get started. However, the state-of-the-art works are usually not tied to specific applicationscenarios of edge computing systems. We are looking forwardto exciting works set up on detailed edge platforms andhardware.

2) Conditional Computation: As demonstrated in Fig. 3,the approaches for Conditional Computation include Com-ponents Sharing, Components Shutoff, Input Filtering, EarlyExit, Results Caching and so on. To put it simply, Conditionalcomputation is selectively turning off some unimportant cal-culations. Thus it can be viewed as block-wise dropout [27].A lot of works devote themselves to ranking and selectingthe most worthy part for computation or early stop if theconfident threshold is achieved. For example, [60] instantiatesa runtime-throttleable neural network which can adaptively

balance learning accuracy and resource occupation in responseto a control signal. It puts Conditional Computation intopractice via block-level gating.

This idea can also be put into use for participator selection.It selects the most valuable participators in Federated Learningfor model updates. The valueless participators will not engagethe aggregation of the global model. To the best of ourknowledge, currently, there is no work dedicated to participatorselection. We are eagerly looking forward to exciting workson it.

3) Algorithm Asynchronization: As demonstrated in Fig.3, Algorithm Asynchronization attempts to aggregate localmodels in an asynchronous way for Federated Learning. Aswe have mentioned before, the participated local users have asignificant probability of failing to complete the model uploadand download due to the wireless network congestion. Apartfrom model compression, another way is exchanging weightsand gradients peer-to-peer to reduce the high concurrency onwireless channels. Random-gossip Communication is a typicalexample. Based on randomized gossip algorithms, Blot et al.propose GoSGD to train DNNs asynchronously [61]. The mostchallenging problem for gossip training is the degradation ofconvergence rate in large-scale edge systems. To overcomethe issue, Daily et al. introduce GossipGraD, which canreduce the communication complexity greatly to ensure thefast convergence [62].

4) Thoroughly Decentralization: As demonstrated in Fig.3, Thoroughly Decentralization attempts to remove the centralaggregator to avoid any possible leakage. Although FederatedLearning does not require consumers private data, the modelupdates still contain private information as some trust of theserver coordinating the training is still required. To avoidprivacy leaks altogether, blockchain technology and game-theoretical approaches can assist in total decentralization.

By leveraging blockchain, especially smart contract, thecentral server for model aggregating is not needed anymore.As a result, collapse triggered by model aggregation can beavoided. Besides, user privacy can be protected. We believethat the blockchain-based Federated Learning will become ahot field and prosperous direction in the coming years. Thereexists work who puts it into practice. In [63], the proposedblockchain-based federated learning architecture, BlockFL,takes edge nodes as miners. Miners exchange and verify all thelocal model updates contributed by each device and then runthe Proof-of-Work (PoW). The miner who firstly completes thePoW generates a new block and receives the mining rewardfrom the blockchain network. At last, each device updates itslocal model from the freshest block. In this paper, blockchainis effectively integrated with Federated Learning to build atrustworthy edge learning environment.

B. Grand ChallengesThe grand challenges for AI on edge are listed from the per-

spective of data availability, model selection, and coordinationmechanism, respectively.

1) Data Availability: The toughest challenge lies in theavailability and usability of raw training data because us-able data is the beginning of everything. Firstly, a proper

Page 10: Edge Intelligence: the Confluence of Edge Computing and

10

incentive mechanism may be necessary for data provisioningfrom mobile users. Otherwise, the raw data may not beavailable for model training and inference. Besides, the rawdata from various end devices could have an obvious bias,which can greatly affect the learning performance. AlthoughFederated Learning can overcome the problem caused by non-i.i.d. samples to a certain extent, the training procedure stillfaces great difficulties on the design of robust communicationprotocol. Therefore, there are huge challenges in terms of dataavailability.

2) Model Selection: At present, the selection of need-to-be-trained AI models faces severe challenges in the followingaspects, across from the models themselves to the trainingframeworks and hardware. Firstly, how to select the befittingthreshold of learning accuracy and scale of AI models forquick deployment and delivery. Secondly, how to select probetraining frameworks and accelerator architectures under thelimited resources. Model selection is coupling with resourceallocation and management, thus the problem is complicatedand challenging.

3) Coordination Mechanism: The proposed methods onModel Adaptation may not be pervasively serviceable becausethere could be a huge difference in computing power and com-munication resources between heterogeneous edge devices. Itmay lead to that the same method achieves different learningresults for different clusters of mobile devices. Therefore, thecompatibility and coordination between heterogeneous edgedevices are of great essence. A flexible coordination mecha-nism between cloud, edge, and device in both hardware andmiddlewares is imperative and urgently need to be designed.It opens research opportunities on a uniform API interface onedge learning for ubiquitous edge devices.

VI. CONCLUDING REMARKS

Edge Intelligence, although still in its primary stage, hasattracted more and more researchers and companies to getinvolved in studying and using it. This article attempts toprovide possible research opportunities through a succinct andeffective classification. Concretely, we first discuss the relationbetween Edge Computing and Artificial Intelligence. We be-lieve that they promote and reinforce each other. After that, wedivide Edge Intelligence into AI for edge and AI on edge andsketch the research road-map. The former focuses on providinga better solution to the key concerns in Edge Computing withthe help of popular and resultful AI technologies while thelatter studies how to carry out the training and inference ofAI models, on edge. Either AI for edge or AI on edge, theresearch road-map is presented in a hierarchical architecture.By the bottom-up approach, we divide research efforts in EdgeComputing into Topology, Content, and Service and introducesome examples on how to energize edge with intelligence.By top-down decomposition, we divide the research efforts inAI on edge into Model Adaptation, Framework Design, andProcessor Acceleration and introduce some existing researchresults. Finally, we present the state of the art and grandchallenges in several hot topics for both AI for edge and AIon edge. We attempt to provide enlightening thought on the

uncultivated land of Edge Intelligence. We hope this articlecan stimulate fruitful discussions on potential future researchprograms in Edge Intelligence.

REFERENCES

[1] M. Asif-Ur-Rahman, F. Afsana, M. Mahmud, M. S. Kaiser, M. R.Ahmed, O. Kaiwartya, and A. James-Taylor, “Toward a heterogeneousmist, fog, and cloud-based framework for the internet of healthcarethings,” IEEE Internet of Things Journal, vol. 6, no. 3, pp. 4049–4062,June 2019.

[2] Ericsson, “Iot connections outlook: Nb-iot and cat-m technologies willaccount for close to 45 percent of cellular iot connections in 2024.”2019. [Online]. Available: https://www.ericsson.com/en/mobility-report/reports/june-2019/iot-connections-outlook

[3] N. Abbas, Y. Zhang, A. Taherkordi, and T. Skeie, “Mobile edgecomputing: A survey,” IEEE Internet of Things Journal, vol. 5, no. 1,pp. 450–465, Feb 2018.

[4] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, and J. Zhang, “Edgeintelligence: Paving the last mile of artificial intelligence with edgecomputing,” Proceedings of the IEEE, vol. 107, no. 8, pp. 1738–1762,Aug 2019.

[5] X. Zhang, Y. Wang, S. Lu, L. Liu, L. Xu, and W. Shi, “Openei: Anopen framework for edge intelligence,” CoRR, vol. abs/1906.01864,2019. [Online]. Available: http://arxiv.org/abs/1906.01864

[6] J. Lin, W. Yu, N. Zhang, X. Yang, H. Zhang, and W. Zhao, “A surveyon internet of things: Architecture, enabling technologies, security andprivacy, and applications,” IEEE Internet of Things Journal, vol. 4, no. 5,pp. 1125–1142, Oct 2017.

[7] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,“Communication-efficient learning of deep networks from decentralizeddata,” in AISTATS, 2016.

[8] J. Xu, Y. Zeng, and R. Zhang, “Uav-enabled wireless power transfer:Trajectory design and energy optimization,” IEEE Transactions onWireless Communications, vol. 17, no. 8, pp. 5092–5106, Aug 2018.

[9] B. Li, Z. Fei, and Y. Zhang, “Uav communications for 5g and beyond:Recent advances and future trends,” IEEE Internet of Things Journal,vol. 6, no. 2, pp. 2241–2263, April 2019.

[10] G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang,“Towards an intelligent edge: Wireless communication meets machinelearning,” CoRR, vol. abs/1809.00343, 2018. [Online]. Available:http://arxiv.org/abs/1809.00343

[11] H. Wu, S. Deng, W. Li, J. Yin, Q. Yang, Z. Wu, and A. Y. Zomaya,“Revenue-driven service provisioning for resource sharing in mobilecloud computing,” in Service-Oriented Computing, M. Maximilien,A. Vallecillo, J. Wang, and M. Oriol, Eds. Cham: Springer InternationalPublishing, 2017, pp. 625–640.

[12] S. Deng, Z. Xiang, J. Yin, J. Taheri, and A. Y. Zomaya, “Composition-driven iot service provisioning in distributed edges,” IEEE Access, vol. 6,pp. 54 258–54 269, 2018.

[13] L. Chen, J. Xu, S. Ren, and P. Zhou, “Spatiotemporal edge serviceplacement: A bandit learning approach,” IEEE Transactions on WirelessCommunications, vol. 17, no. 12, pp. 8388–8401, Dec 2018.

[14] S. Deng, H. Wu, W. Tan, Z. Xiang, and Z. Wu, “Mobile service selectionfor composition: An energy consumption perspective,” IEEE Trans.Automation Science and Engineering, vol. 14, no. 3, pp. 1478–1490,2017. [Online]. Available: https://doi.org/10.1109/TASE.2015.2438020

[15] S. Deng, L. Huang, J. Taheri, J. Yin, M. Zhou, and A. Y. Zomaya,“Mobility-aware service composition in mobile communities,” IEEETrans. Systems, Man, and Cybernetics: Systems, vol. 47, no. 3,pp. 555–568, 2017. [Online]. Available: https://doi.org/10.1109/TSMC.2016.2521736

[16] Z. Wu, J. Yin, S. Deng, J. Wu, Y. Li, and L. Chen, “Modern serviceindustry and crossover services: Development and trends in china,”IEEE Trans. Services Computing, vol. 9, no. 5, pp. 664–671, 2016.[Online]. Available: https://doi.org/10.1109/TSC.2015.2418765

[17] S. Zhang, P. He, K. Suto, P. Yang, L. Zhao, and X. Shen, “Cooperativeedge caching in user-centric clustered mobile networks,” IEEE Transac-tions on Mobile Computing, vol. 17, no. 8, pp. 1791–1805, Aug 2018.

[18] H. Zhao, W. Du, W. Liu, T. Lei, and Q. Lei, “Qoe aware and cellcapacity enhanced computation offloading for multi-server mobile edgecomputing systems with energy harvesting devices,” in 2018 IEEEInternational Conference on Ubiquitous Intelligence Computing, Oct2018, pp. 671–678.

Page 11: Edge Intelligence: the Confluence of Edge Computing and

11

[19] H. Zhao, S. Deng, C. Zhang, W. Du, Q. He, and J. Yin, “A mobility-aware cross-edge computation offloading framework for partitionableapplications,” in 2019 IEEE International Conference on Web Services,Jul 2019, pp. 193–200.

[20] M. Min, L. Xiao, Y. Chen, P. Cheng, D. Wu, and W. Zhuang, “Learning-based computation offloading for iot devices with energy harvesting,”IEEE Transactions on Vehicular Technology, vol. 68, no. 2, pp. 1930–1941, Feb 2019.

[21] X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji, and M. Bennis, “Perfor-mance optimization in mobile-edge computing via deep reinforcementlearning,” in 2018 IEEE 88th Vehicular Technology Conference (VTC-Fall), Aug 2018, pp. 1–6.

[22] S. Deng, H. Wu, D. Hu, and J. L. Zhao, “Service selectionfor composition with qos correlations,” IEEE Trans. ServicesComputing, vol. 9, no. 2, pp. 291–303, 2016. [Online]. Available:https://doi.org/10.1109/TSC.2014.2361138

[23] C. Zhang, H. Zhao, and S. Deng, “A density-based offloading strategyfor iot devices in edge computing systems,” IEEE Access, vol. 6, pp.73 520–73 530, 2018.

[24] J. Park, S. Samarakoon, M. Bennis, and M. Debbah, “Wireless networkintelligence at the edge,” CoRR, vol. abs/1812.02858, 2018. [Online].Available: http://arxiv.org/abs/1812.02858

[25] J. Wang, J. Zhang, W. Bao, X. Zhu, B. Cao, and P. S. Yu, “Notjust privacy: Improving performance of private deep learning inmobile cloud,” CoRR, vol. abs/1809.03428, 2018. [Online]. Available:http://arxiv.org/abs/1809.03428

[26] E. Li, Z. Zhou, and X. Chen, “Edge intelligence: On-demanddeep learning model co-inference with device-edge synergy,” in Pro-ceedings of the 2018 Workshop on Mobile Edge Communications,MECOMM@SIGCOMM 2018, Budapest, Hungary, August 20, 2018,2018, pp. 31–36.

[27] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhut-dinov, “Dropout: A simple way to prevent neural networks from overfit-ting,” Journal of Machine Learning Research, vol. 15, pp. 1929–1958,2014.

[28] V. Sze, Y. Chen, T. Yang, and J. S. Emer, “Efficient processing of deepneural networks: A tutorial and survey,” Proceedings of the IEEE, vol.105, no. 12, pp. 2295–2329, Dec 2017.

[29] J. Lee, J. K. Eshraghian, K. Cho, and K. Eshraghian, “Adaptive precisioncnn accelerator using radix-x parallel connected memristor crossbars,”ArXiv, vol. abs/1906.09395, 2019.

[30] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, “Artificial neuralnetworks-based machine learning for wireless networks: A tutorial,”IEEE Communications Surveys Tutorials, pp. 1–1, 2019.

[31] O. Abari, H. Rahul, and D. Katabi, “Over-the-air function computationin sensor networks,” CoRR, vol. abs/1612.02307, 2016. [Online].Available: http://arxiv.org/abs/1612.02307

[32] G. Zhu, L. Chen, and K. Huang, “Over-the-air computation in MIMOmulti-access channels: Beamforming and channel feedback,” CoRR,vol. abs/1803.11129, 2018. [Online]. Available: http://arxiv.org/abs/1803.11129

[33] G. Zhu, Y. Wang, and K. Huang, “Low-latency broadband analogaggregation for federated edge learning,” CoRR, vol. abs/1812.11494,2018. [Online]. Available: http://arxiv.org/abs/1812.11494

[34] K. Yang, T. Jiang, Y. Shi, and Z. Ding, “Federated learning viaover-the-air computation,” CoRR, vol. abs/1812.11750, 2018. [Online].Available: http://arxiv.org/abs/1812.11750

[35] D. Liu, G. Zhu, J. Zhang, and K. Huang, “Wireless data acquisitionfor edge learning: Importance aware retransmission,” CoRR, vol.abs/1812.02030, 2018. [Online]. Available: http://arxiv.org/abs/1812.02030

[36] J.-H. Ahn, O. Simeone, and J. Kang, “Wireless federated distillationfor distributed edge learning with heterogeneous data,” ArXiv, vol.abs/1907.02745, 2019.

[37] Y. Shen, Y. Shi, J. Zhang, and K. B. Letaief, “A graph neural network ap-proach for scalable wireless power control,” ArXiv, vol. abs/1907.08487,2019.

[38] D. A. Temesgene, M. Miozzo, and P. Dini, “Dynamic control offunctional splits for energy harvesting virtual small cells: a distributedreinforcement learning approach,” ArXiv, vol. abs/1906.05735v1, 2019.

[39] L. Chen, J. Xu, S. Ren, and P. Zhou, “Spatiotemporal edge serviceplacement: A bandit learning approach,” IEEE Transactions on WirelessCommunications, vol. 17, no. 12, pp. 8388–8401, Dec 2018.

[40] Y. Chen, S. Deng, H. Zhao, Q. He, and H. G. Y. Li, “Data-intensive ap-plication deployment at edge: A deep reinforcement learning approach,”in 2019 IEEE International Conference on Web Services, Jul 2019, pp.355–359.

[41] K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath,“Deep reinforcement learning: A brief survey,” IEEE Signal ProcessingMagazine, vol. 34, no. 6, pp. 26–38, Nov 2017.

[42] X. Qiu, L. Liu, W. Chen, Z. Hong, and Z. Zheng, “Online deep rein-forcement learning for computation offloading in blockchain-empoweredmobile edge computing,” IEEE Transactions on Vehicular Technology,pp. 1–1, 2019.

[43] X. Chen, H. Zhang, C. Wu, S. Mao, Y. Ji, and M. Bennis, “Optimizedcomputation offloading performance in virtual edge computing systemsvia deep reinforcement learning,” IEEE Internet of Things Journal,vol. 6, no. 3, pp. 4005–4018, June 2019.

[44] L. Huang, S. Bi, and Y. A. Zhang, “Deep reinforcement learningfor online offloading in wireless powered mobile-edge computingnetworks,” CoRR, vol. abs/1808.01977, 2018. [Online]. Available:http://arxiv.org/abs/1808.01977

[45] L. Lei, H. X. andXiong Xiong, K. Zheng, W. Xiang, and X. Wang,“Multi-user resource control with deep reinforcement learning in iotedge computing,” ArXiv, vol. abs/1906.07860, 2019.

[46] Q. Qi and Z. Ma, “Vehicular edge computing via deep reinforcementlearning,” CoRR, vol. abs/1901.04290, 2019. [Online]. Available:http://arxiv.org/abs/1901.04290

[47] Z. Yang, Y. Liu, Y. Chen, and N. Al-Dhahir, “Cache-aided nomamobile edge computing: A reinforcement learning approach,” ArXiv, vol.abs/1906.08812, 2019.

[48] J. Konecny, H. B. McMahan, F. X. Yu, P. Richtarik, A. T.Suresh, and D. Bacon, “Federated learning: Strategies for improvingcommunication efficiency,” CoRR, vol. abs/1610.05492, 2016. [Online].Available: http://arxiv.org/abs/1610.05492

[49] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan,S. Patel, D. Ramage, A. Segal, and K. Seth, “Practical secure aggregationfor federated learning on user-held data,” CoRR, vol. abs/1611.04482,2016. [Online]. Available: http://arxiv.org/abs/1611.04482

[50] P. S. Chandakkar, Y. Li, P. L. K. Ding, and B. Li, “Strategies for re-training a pruned neural network in an edge computing paradigm,” in2017 IEEE International Conference on Edge Computing (EDGE), June2017, pp. 244–247.

[51] M. Rusci, A. Capotondi, and L. Benini, “Memory-driven mixedlow precision quantization for enabling deep network inferenceon microcontrollers,” CoRR, vol. abs/1905.13082, 2019. [Online].Available: http://arxiv.org/abs/1905.13082

[52] R. Venkatesan and B. Li, “Diving deeper into mentee networks,”CoRR, vol. abs/1604.08220, 2016. [Online]. Available: http://arxiv.org/abs/1604.08220

[53] I. Chakraborty, D. Roy, A. Ankit, and K. Roy, “Efficient hybridnetwork architectures for extremely quantized neural networks enablingintelligence at the edge,” CoRR, vol. abs/1902.00460, 2019. [Online].Available: http://arxiv.org/abs/1902.00460

[54] U. Thakker, J. G. Beu, D. Gope, G. Dasika, and M. Mattina, “Run-timeefficient RNN compression for inference on edge devices,” CoRR, vol.abs/1906.04886, 2019. [Online]. Available: http://arxiv.org/abs/1906.04886

[55] G. Li, L. Liu, X. Wang, X. Dong, P. Zhao, and X. Feng, “Auto-tuningneural network quantization framework for collaborative inferencebetween the cloud and edge,” CoRR, vol. abs/1812.06426, 2018.[Online]. Available: http://arxiv.org/abs/1812.06426

[56] L. Zhang, Z. Tan, J. Song, J. Chen, C. Bao, and K. Ma, “SCAN:A scalable neural networks framework towards compact and efficientmodels,” CoRR, vol. abs/1906.03951, 2019. [Online]. Available:http://arxiv.org/abs/1906.03951

[57] A. Gupta, C. G. S. Suggala, A. Gupta, H. Simhadri, B. Paranjape, A. Ku-mar, S. Goyal, R. Udupa, M. Varma, and P. Jain, “Protonn: Compressedand accurate knn for resource-scarce devices,” in Proceedings of the34th International Conference on Machine Learning, Sydney, Australia,PMLR 70, February 2017.

[58] M. J. Kusner, S. Tyree, K. Q. Weinberger, and K. Agrawal, “Stochasticneighbor compression,” in ICML, 2014.

[59] I. Chakraborty, D. Roy, I. Garg, A. Ankit, and K. Roy, “Pca-driven hybrid network design for enabling intelligence at theedge,” CoRR, vol. abs/1906.01493, 2019. [Online]. Available: http://arxiv.org/abs/1906.01493

[60] J. Hostetler, “Toward runtime-throttleable neural networks,” CoRR, vol.abs/1905.13179, 2019. [Online]. Available: http://arxiv.org/abs/1905.13179

[61] M. Blot, D. Picard, M. Cord, and N. Thome, “Gossip training fordeep learning,” CoRR, vol. abs/1611.09726, 2016. [Online]. Available:http://arxiv.org/abs/1611.09726

Page 12: Edge Intelligence: the Confluence of Edge Computing and

12

[62] J. Daily, A. Vishnu, C. Siegel, T. Warfel, and V. Amatya,“Gossipgrad: Scalable deep learning using gossip communication basedasynchronous gradient descent,” CoRR, vol. abs/1803.05880, 2018.[Online]. Available: http://arxiv.org/abs/1803.05880

[63] H. Kim, J. Park, M. Bennis, and S. Kim, “On-device federated learningvia blockchain and its latency analysis,” CoRR, vol. abs/1808.03949,2018. [Online]. Available: http://arxiv.org/abs/1808.03949

Shuiguang Deng is currently a full professor atthe College of Computer Science and Technology inZhejiang University, China, where he received a BSand PhD degree both in Computer Science in 2002and 2007, respectively. He previously worked at theMassachusetts Institute of Technology in 2014 andStanford University in 2015 as a visiting scholar. Hisresearch interests include Edge Computing, ServiceComputing, Mobile Computing, and Business Pro-cess Management. He serves as the associate editorfor the journal IEEE Access and IET Cyber-Physical

Systems: Theory & Applications. Up to now, he has published more than 100papers in journals and refereed conferences. In 2018, he was granted theRising Star Award by IEEE TCSVC. He is a fellow of IET and a seniormember of IEEE.

Hailiang Zhao received the B.S. degree in 2019from the school of computer science and technology,Wuhan University of Technology, Wuhan, China.He is currently pursuing the Ph.D. degree withthe College of Computer Science and Technology,Zhejiang University, Hangzhou, China. He has beena recipient of the Best Student Paper Award of IEEEICWS 2019. His research interests include edgecomputing, service computing and machine learning.

Jianwei Yin received the Ph.D. degree in computerscience from Zhejiang University (ZJU) in 2001. Hewas a Visiting Scholar with the Georgia Institute ofTechnology. He is currently a Full Professor with theCollege of Computer Science, ZJU. Up to now, hehas published morethan 100 papers in top interna-tional journals and conferences. His current researchinterests include service computing and businessprocess management. He is an Associate Editor ofthe IEEE Transactions on Services Computing.

Schahram Dustdar is a Full Professor of ComputerScience (Informatics) with a focus on Internet Tech-nologies heading the Distributed Systems Group atthe TU Wien. He is Chairman of the InformaticsSection of the Academia Europaea (since December9, 2016). He is elevated to IEEE Fellow (sinceJanuary 2016). From 2004-2010 he was HonoraryProfessor of Information Systems at the Departmentof Computing Science at the University of Gronin-gen (RuG), The Netherlands.

From December 2016 until January 2017 he wasa Visiting Professor at the University of Sevilla, Spain and from January untilJune 2017 he was a Visiting Professor at UC Berkeley, USA. He is a memberof the IEEE Conference Activities Committee (CAC) (since 2016), of theSection Committee of Informatics of the Academia Europaea (since 2015),a member of the Academia Europaea: The Academy of Europe, InformaticsSection (since 2013). He is recipient of the ACM Distinguished Scientistaward (2009) and the IBM Faculty Award (2012). He is an Associate Editorof IEEE Transactions on Services Computing, ACM Transactions on the Web,and ACM Transactions on Internet Technology and on the editorial board ofIEEE Internet Computing. He is the Editor-in-Chief of Computing (an SCI-ranked journal of Springer).

Albert Y. Zomaya is the Chair Professor ofHigh Performance Computing & Networking inthe School of Information Technologies, Universityof Sydney, and he also serves as the Director ofthe Centre for Distributed and High PerformanceComputing. Professor Zomaya published more than600 scientific papers and articles and is author, co-author or editor of more than 20 books. He is theFounding Editor in Chief of the IEEE Transactionson Sustainable Computing and serves as an associateeditor for more than 20 leading journals. Professor

Zomaya served as an Editor in Chief for the IEEE Transactions on Computers(2011-2014). He is a Chartered Engineer, a Fellow of AAAS, IEEE, and IET.Professor Zomayas research interests are in the areas of parallel and distributedcomputing and complex systems.