Upload
roberto-fierimonte
View
29
Download
1
Embed Size (px)
Citation preview
Distributed Learning by Neural Networks
Roma, 28/01/2015
Candidato: Roberto Fierimonte Relatore: Prof. Massimo Panella
Correlatore: Dr. Simone Scardapane
OUTLINE
Distributed Learning by Neural Networks 2/17
• Technological Context
• Distributed Machine Learning
• Algorithms Development
• Experimental Results
• Conclusions
TECHNOLOGICAL CONTEXT Increasing trends in ubiquitous informaPon-‐collecPng devices, in storage capability, in communicaPon speed (Moore’s law of ICT). As a consequence, a growing amount of data is available to be stored in databases and data warehouses, whose locaPon is oUen distributed through a network of interconnected nodes. PROBLEMS: • Data may not allowed to be shared • Network topology changes dynamically
• Dependence on a central hub
• Constraints on resources
Distributed Learning by Neural Networks 3/17
TARGET: developing efficient algorithms to perform decentralized analysis and inference on distributed data
DISTRIBUTED MACHINE LEARNING
A machine learning problem that requires the cooperaPon of several agents (or nodes) is named distributed learning problem. Two paradigms for distributed learning problems: • Model-‐distributed, in which only a part of the overall model’s parameters are
known to each node • Data-‐distributed, in which only a fracPon of the overall training set is available to
each node
Distributed Learning by Neural Networks 4/17
Node 1
Node 3
Node 2
Dataset
Node
Model
S3
Node 4
S1
S2 S4
Model
Input/Output
Link
DISTRIBUTED MACHINE LEARNING
“Given a network and a segregated dataset T such that Tk is related with the k-‐th node of the network, how can each machine learn a mapping of examples to class labels in T without communicaPng any data-‐point of Tk ?”
Distributed Learning by Neural Networks 5/17
PRACTICAL APPLICATIONS: • Big data problems
• Music classificaPon • Financial forecasPng
• Learning over sensors network
• Environmental monitoring • Infomobility
• Privacy preserving data mining
• Electronic health • Fraud detecPon
DISTRIBUTED MACHINE LEARNING
LEARNING BY CONSENSUS Distributed Average Consensus (DAC) is a distributed protocol for calculaPng the average of a series of measurements within a network. • No need to exchange data between nodes • Robustness with respect to the network topology • Totally distributed strategy • Ease of implementaPon
Distributed Learning by Neural Networks 6/17
IDEA: Train a common learning machine on the local dataset for every node, and then compute the global soluPon by averaging the local soluPons through the Consensus protocol.
BATCH LEARNING: When the enPre training set is available before starPng the training
ALGORITHMS DEVELOPMENT Proposed algorithms are a combinaPon of Consensus protocol and Random Vector FuncPonal-‐Link networks (RVFLs).
If the size of the hidden layer is adequately high, RVFL are universal approximators for a wide range of basis funcPons h [Igelnik and Pao, 1995].
Distributed Learning by Neural Networks 7/17
x1
x2
h1
h2
h3
y
β1
β2
β3
y = βii=1
m∑ ⋅hi (x,wi ) = β
Th(x,w1,...,wm )
β* = (HTH +λI )−1HTYREGULARIZED LEAST SQUARES:
ALGORITHMS DEVELOPMENT CONSENSUS-‐BASED BATCH DISTRIBUTED RVFL: 1. INITIALIZATION: RVFLs hidden parameters are generated
randomly and then shared among the nodes. 2. LOCAL SOLUTION COMPUTATION: each node calculates its local esPmate
o using its corresponding local dataset.
3. GLOBAL SOLUTION COMPUTATION: once that each node has computed its own local esPmates, using the Consensus protocol the global soluPon is computed, resulPng in the average of the local esPmates.
Distributed Learning by Neural Networks 8/17
w1,...,wm
βi*
β* =1N
βi*
i=1
N∑
βi* = (Hi
THi +λI )−1Hi
TYi
ALGORITHMS DEVELOPMENT Is not always possible to have the enPre dataset available before starPng the training: • Data are obtained in real-‐Pme • Datasets are too large to be processed in batch
Distributed Learning by Neural Networks 9/17
β k+1 = β k −αkHk+1
T Hk+1βk −Hk+1
T Yk+1 +λβk
Lk+1
Kk+1 = Kk + (Hk+1)T Hk+1
β k+1 = β k +Kk+1−1Hk+1
T (Yk+1 −Hk+1βk )
RECURSIVE LEAST SQUARES (RLS):
LEAST MEAN SQUARES (LMS):
ONLINE LEARNING:
ORIGINAL PROPOSAL: Extending the results obtained in distributed batch learning to distributed online learning problems
ALGORITHMS DEVELOPMENT CONSENSUS-‐BASED ONLINE DISTRIBUTED RVFL (RLS version): 1. INITIALIZATION: RVFLs’ hidden parameters are generated
randomly and then shared among the nodes. 2. LOCAL SOLUTION UPDATE: each node updates its corresponding local
esPmate using its new data according to RLS: 3. GLOBAL SOLUTION UPDATE: once that each node has computed its own
local esPmates, using the Consensus protocol the global soluPon is computed, resulPng in the average of the local updates.
Distributed Learning by Neural Networks 10/17
w1,...,wm
βCk+1 =
1N
βik+1
i=1
N∑
Kk+1i = Kk
i + (Hk+1i )T Hk+1
i
βik+1 = βC
k + (Kk+1i )−1(Hk+1
i )T (Yk+1i −Hk+1
i βCk )
ALGORITHMS DEVELOPMENT CONSENSUS-‐BASED ONLINE DISTRIBUTED RVFL (LMS version): 1. INITIALIZATION: RVFLs’ hidden parameters are generated
randomly and then shared among the nodes. 2. LOCAL SOLUTION UPDATE: each node updates its corresponding local
esPmate using its new data according to LMS: 3. GLOBAL SOLUTION UPDATE: once that each node has computed its own
local esPmates, using the Consensus protocol the global soluPon is computed, resulPng in the average of the local updates.
Distributed Learning by Neural Networks 11/17
w1,...,wm
βCk+1 =
1N
βik+1
i=1
N∑
βik+1 = βC
k −αki (Hk+1
i )T Hk+1i βC
k − (Hk+1i )TYk+1 +λβC
k
Lk+1
ALGORITHMS DEVELOPMENT ALGORITHMS IMPLEMENTATION: The proposed algorithms are implemented in MATLAB in order to test their efficacy, using the Parallel CompuPng Toolbox (PCT). • Use of the spmd command to run the code on a pool of machine or on a simulated
distributed architecture (up to 12 nodes) on a single machine.
• A serial version of the code is implemented to perform large-‐scale simulaPons
Distributed Learning by Neural Networks 12/17
EXPERIMENTAL RESULTS EXPERIMENTAL SETUP SimulaPons performed on 4 freely available datasets: Type of simulaPons performed:
• Performance and scalability
• Influence of the network topology
• Comparison in training Pme
• Performance of the original online algorithms All results are obtained on a machine equipped with Intel i5 @3.00 GHz processor and 16 GB RAM running MATLAB R2013a. Distributed Learning by Neural Networks 13/17
Dataset N° of Features N° of Instances Predicted value Task Type
Banknote 4 1372 Banknote class Binary classificaPon
g50c 50 550 Gaussian label Binary classificaPon
CCPP 4 9568 Energy output Regression
Garageband 44 1856 Music genre 9-‐class classificaPon
EXPERIMENTAL RESULTS PERFORMANCE AND SCALABILITY
Distributed Learning by Neural Networks 14/17
0 5 10 15 20 25 30 35 40 45 50ï2
0
2
4
6
8
10
12
14
Nodes of network
Erro
r [%
]
CentralizedïRVFLConsensusïRVFLLocalïRVFL
0 5 10 15 20 25 30 35 40 45 500
5
10
15
20
25
30
Nodes of network
Erro
r [%
]
CentralizedïRVFLConsensusïRVFLLocalïRVFL
0 5 10 15 20 25 30 35 40 45 500.23
0.24
0.25
0.26
0.27
0.28
0.29
0.3
Nodes of network
NR
MSE
CentralizedïRVFLConsensusïRVFLLocalïRVFL
0 5 10 15 20 25 30 35 40 45 5035
40
45
50
55
60
65
70
75
Nodes of network
Erro
r [%
]
CentralizedïRVFLConsensusïRVFLLocalïRVFL
EXPERIMENTAL RESULTS INFLUENCE OF NETWORK TOPOLOGY
COMPARISON IN TRAINING TIME
Distributed Learning by Neural Networks 15/17
0 5 10 150
50
100
150
200
250
300
350
Nodes of network
Num
ber o
f ite
ratio
ns
Cyclic LatticeFully ConnectedLinear Topology (K=1)Linear Topology (K=4)Random Topology (p=0.25)Random Topology (p=0.5)
0 2 4 6 8 10 12 14 160
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
Nodes of network
Trai
ning
tim
e [s
]
CentralizedïRVFLConsensusïRVFLLocalïRVFL
0 2 4 6 8 10 12 14 160
0.005
0.01
0.015
0.02
0.025
0.03
0.035
Nodes of network
Trai
ning
tim
e [s
]
CentralizedïRVFLConsensusïRVFLLocalïRVFL
0 5 10 150
20
40
60
80
100
120
140
160
180
200
Nodes of network
Num
ber o
f ite
ratio
ns
Cyclic LatticeFully ConnectedLinear Topology (K=1)Linear Topology (K=4)Random Topology (p=0.25)Random Topology (p=0.5)
EXPERIMENTAL RESULTS ORIGINAL ONLINE ALGORITHMS PERFORMANCE
Distributed Learning by Neural Networks 16/17
0 20 40 60 80 100 1200
5
10
15
20
25
30
35
40
45
50
Number of iterations
Erro
r [%
]
CentralizedïRVFLConsensusïRVFL (RLS)ConsensusïRVFL (LMS)LocalïRVFL
0 5 10 15 20 25 30 35 40 45 500
5
10
15
20
25
30
35
40
45
50
Number of iterations
Erro
r [%
]
CentralizedïRVFLConsensusïRVFL (RLS)ConsensusïRVFL (LMS)LocalïRVFL
0 100 200 300 400 500 600 700 8000.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Number of iterations
Erro
r
CentralizedïRVFLConsensusïRVFL (RLS)ConsensusïRVFL (LMS)LocalïRVFL
0 20 40 60 80 100 120 140 16030
40
50
60
70
80
90
Number of iterations
Erro
r [%
]
CentralizedïRVFLConsensusïRVFL (RLS)ConsensusïRVFL (LMS)LocalïRVFL
CONCLUSIONS The proposed algorithms solve the distributed online learning problem in a totally decentralized way requiring only local communicaPons and any exchange of data points. Experimental results show that the algorithms are capable to achieve performances comparable with a centralized soluPon, except for LMS version of the online algorithm. ApplicaPon of the proposed algorithms to music classificaPon problem is invesPgated in: [Scardapane et al., 2015b]. All the code developed for the thesis is freely available for consultaPon at: hrps://github.com/roberto-‐fierimonte/tesi-‐rvfl-‐online
FUTURE DEVELOPMENTS: • ApplicaPon to model-‐distributed problems • ApplicaPon to problems with constraints on energy and Pme • ApplicaPon to unsupervised learning problems
Distributed Learning by Neural Networks 17/17
Distributed Learning by Neural Networks
THANK YOU
FOR YOUR ATTENTION