View
2
Download
0
Category
Preview:
Citation preview
Liang XiaoXiamen University
International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt)
ShanghaiMay 7, 2018
Game-theoretic Methods for Modeling and Defending Against Advanced Persistent Threats in Wireless Systems
Outline Background & motivations Game theoretic study on APT defense APT defense game model Prospect theory based study on APT Colonel Blotto game based study on APT Dynamic APT defense game
Reinforcement learning based APT defense solution
Conclusion
2
Slingshot APT used malware to spy on international targets from 2012 until Feb. 2018
Lazarus APT stole $81 million from Central Bank of Bangladesh in 2016
$31 million was stolen from Russian central bank in 2016 500 million Yahoo user accounts were leaked in 2016
APTs in Cyber Systems
3 [Threat Landscape Survey'17]
APT Attacks Target the objective Infiltrate malware onto the target
system Control by command-and-control
server (installing other malware) Spread the attack onto several
systems Exfiltrate target data from the
victim's network Cover tracks to maintain access to
the victim's system over a long time
4
Another APT Example Reconnaissance: Leverage information from a variety
of factors to understand their target Incursion: Use social engineering to deliver targeted
malware Discovery: Stay “low and slow”, map the defenses from
the inside and create a battle plan Capture information over an extended period, install
malware to secretly acquire data or disrupt operations Exfiltration: Send information to the attacker
5
Hackers cover their tracks to access the data for long time without being detected
APT attackers aim to steal data rather than to break the target network or crash all the data
APT attackers use multiple sophisticated attack methods Defense against APTs: Detect APTs as early as possible to minimize the damages
instead of keeping attackers out
Challenges to Detect APTs
6
Outline Background & motivations Game theoretic study on APT defense APT defense game model Prospect theory based study on APT Colonel Blotto game based study on APT Dynamic APT defense game
Reinforcement learning based APT defense solution
Conclusion
7
APT attackers motivated by their hunger for money and secrets
APTs are stealthy, continuous, sophisticated and long-term Game theory can capture the long-term continuous
interaction between the APT attacker and the defender on the system resources Model the interactions between the APT attackers and
defenders Understand the fundamental tradeoffs between the security
risk and the APT defense cost Help design APT defense strategies
Game Theoretic Study on APTs
8
APT Defense Game APT game between an attacker choosing its attack path and a defender
selecting its best response according to the attack classification result [Fang’14]
Stealthy attacks game with asymmetric feedback model under limited attack or protection resources [Zhang’15]
Cyber-physical signaling game between a cloud and a mobile device, and a Flipit game between an APT attacker and a cloud defender [Pawlick’15]
Three-player Stackelberg game, in which a defender as the leader addresses both APTs and insider attacks [Feng’15]
Zero-sum matrix game against APT movements without being aware of the adversary model [Rass’17]
Dynamic APT game between an attacker choosing its attack resouces and a defender determining its prevention and recovery strength [Yang’17]9
FlipIt Game: Non-zero-sum game between an APT attacker and a defender competing for a cyber resource [Dijk’13] Attacker: whether to compromise the cyber system Defender: whether to restore the cyber system Goal: maximize its utility that increases with the resource
control time and decreases with the attack/defense cost
FlipIt Game
10
APT Defense Game Model APT attacker chooses its time interval to launch APTs Defender chooses its time interval to scan the devices Random duration to complete an APT attack Safe time for the data stored on storage device i:
11
1,/)(min iii xzy
APT Defense Game with Pure Strategy Normalized attack/defense interval over each device Random attack durations zi quantized into L non-zero
levels, each with probability Limited overall attack/defense computation resources Expected utilities of the defender and the attacker
Both players choose their policies to maximize their own expected utilities
12
S
i
L
lii
i
iil
EUTD Gx
LxlLyPU
1 01,min),( yx
S
ii
L
lii
i
iil
EUTA Cxy
LxlLyPU
1 0
1,min),( yx
Gain of a longer scan interval
Attack cost
, 0ilP l L
Defense Game with Mixed Strategy Defender chooses its detection interval quantized into M
levels with a distribution Attacker quantizes its non-zero attack interval into N levels
and chooses the attack distribution Known attack duration z Expected utilities of the defender and APT attacker:
13
Mmmp 1][p
Nnnq 0][q
CMm
Nn
MmzNnqpU
MmG
MmzNnqpU
n
M
m
N
nm
EUTA
n
M
m
N
nm
EUTD
1,/
/min),(
1,/
/min),(
1 0
1 0
qp
qp
NE of the Game Nash equilibrium (NE) of a game: No player can
increase its expected utility by unilaterally leaving the NE strategy Based on the expected utility theory (EUT)
NE of the APT defense game with pure-strategy:
14
* *
* *
1 1
arg max ( , )
arg max ( , )
,
0 1, 0 1, 1
EUTD
EUTA
S S
i x i yi i
i i
U
U
x T y T
x y i S
x
y
x x y
y x y
Outline Background & motivations Game theoretic study on APT defense APT defense game model Prospect theory based study on APT Colonel Blotto game based study on APT Dynamic APT defense game
Reinforcement learning based APT defense solution
Conclusion
15
PT-based APT Defense Game EUT-based study on APT defense deviates from real-
life decision-making due to the subjectivity of attackers under uncertainties
Prospect theoretic study of the cloud storage defense against subjective APT attackers Model the decision making of a subjective attacker under
uncertainties Pure-strategy: Uncertain time to hack a storage device Mixed-strategy: Uncertain action of the opponent
16
Prospect Theory
17
Expected utility theory (EUT) cannot explain the deviations due to end-user subjectivity
Prospect theory (PT) [Kahneman,Tversky’79] as a Nobel prize winning theory explains the deviations in monetary transactions: People usually over-weigh low probability outcomes and
underweigh their outcomes with high probabilities Loss looms larger than gains
Prospect theory has recently been applied in many contexts: Social sciences [Gao’10] [Harrison’09] [Tanaka’16] Communication networks [Li’14] [Yu’14] [Yang’14] [Lee’15] Smart energy management [Wang’14] [Xiao’14]
Allais’s Paradox
18
High Probability
GainLose
95% chance to win $10,000 100% chance to win $9,499
0.95*10000>9499Low Probability
5% chance to win $10,000 100% chance to win $501
0.05*10000<501
High Probability 95% chance to lose $10,000 100% chance to lose $9,499
0.95*10000>9499Low Probability
5% chance to lose $10,000 100% chance to lose $501
0.05*10000<501
Fear of disappointment. Risk averse.
Hope to avoid loss. Risk seeking
Fear of large loss. Risk averse
Hope of large gain.Risk seeking
Probability weighting function models the subjectivity of a player Subjective probability for a player to weigh the outcome
with a probability p S-shaped and asymmetrical, ranging in [0,1] Objective weight decreases with the player’s subjective
evaluation distortion Prelec function [Prelec’98]:
Probability Weighting Functions
19
w p exp ln p
, 1
-α=1-α=0.5
Related Work PT-based channel access game between two subjective end-
users in wireless network [Li’12] Wireless operator invests spectrum to users under uncertain
spectrum supply using PT [Yu’14] PT-based random access game between two users choosing
the transmission probabilities on a radio channel [Li’14] Stackelberg game between the SP offering the bandwidth
and subjective end-users to choose services [Yang’14] PT-based microgrids energy trading game in smart grids
[Xiao’15]
20
PT-based ATP Detection Game PT-based APT detection game: Model the decision making
of a subjective attacker under uncertainties Uncertain attack durations Uncertain defense policies
21
Subjective APT Game Model Subjective storage defense game between the attacker
and defender with pure strategies under uncertain attack duration zi The attacker weighs the attack outcome with subjective
probabilities , and PT-based utilities:
22
S
i
L
lii
i
iilD
PTD Gx
LxlLyPwU
1 01,min),( yx
S
ii
L
lii
i
iilA
PTA Cxy
LxlLyPwU
1 0
1,min),( yx
, 0iA lw P l L Si 1
NE of the PT-based APT Defense Game Best response of the player in terms of the PT-based utility, if
the opponent uses the NE strategy
23
* *
* *
1 1
argmax ( , )
argmax ( , )
,
0 1,0 1, 1
PTD
PTA
S S
i x i yi i
i i
U
U
x T y T
x y i S
x
y
x x y
y x y
Example of the NE of the PT-based Game
24
The NE of the subjective APT game with S=1 and L=2 is given by
1 0
* *1 0 1
0 1
(0.5,0), exp ln , C exp ln
( , ) (1,0), exp ln , C exp ln 0.5exp ln
(1,1), C exp ln 0.5exp ln
D A
D A A
A A
G P P
x y G P P P
P P
PT-based APT Game with Mixed-Strategy Each player holds subjective view on the opponent’s strategy PT-based utilities:
NE of the game with mixed-strategy:
26
CMm
Nn
MmzNnqpwU
MmG
MmzNnqwpU
n
M
m
N
nmA
PTA
nD
M
m
N
nm
PTD
1,/
/min),(
1,/
/min),(
1 0
1 0
qp
qp
*10
1 ,0
*
01 ,0
*
1
*
1
,
,
1, 0,
1, 0,
0, 0
T
D D k D Nk Nm M n N
TT
A A k A Mk Mm M n N
M
m mm
N
n nn
D A
m nu w qM N
m nu w pM N
p p m
q q n
1
1
Lagrange parameters
is a η-dimensional all 1 column vector1
If the NE of the
subjective game with mixed-strategy and two interval
levels is given by
27
,1
1,10,10,5.01,5.0and1
1,5.01,10,10,5.0
AA
AA
DD
DD
uuuu
uuuu
0ln1ln
1,5.01,10,10,5.0ln
0ln1ln1,10,1
0,5.01,5.0ln
*0
*0
*1
*1
DD
AA
qquu
uu
ppuuuu
DD
DD
AA
AA
Example of the NE of the PT-based Game with Mixed- Strategy
ProofKarush-Kuhn-Tucker (KKT) optimality conditions:
28
Then apply the complementary slackness to obtain:
01
1,0,0,0
0
)1(),(
1
11
*
M
m
im
im
im
im
im
im
D
M
m
im
im
M
m
im
PTDD
p
MmpppL
ppUL
qp
0
1
1,0)(),(
1
*
0
iD
M
m
im
iD
inD
N
n
iD
p
MkqNn
Mku
29
NE of the PT-based APT Defense Game with Mixed Strategy
D. Xu, et al., "Prospect Theoretic Study of Cloud Storage Defense Against Advanced Persistent Threats," IEEE Global Commun. Conf. (GLOBECOM), 2016.
Value distortion function models the framing effect of the subjective decisions Evaluate alternatives gains and losses in terms of the frame
of reference, Steeper for losses than for gains if loss aversion coefficient Risk aversion/seeking coefficient decreases with the player’s
subjective value evaluation distortion
Value distortion function:
Value Distortion Functions
30
o.w.,)()(,)()(
0
00
uUuvUuUuuv
0U1
-8 -6 -4 -2 0 2 4 6 8-8
-6
-4
-2
0
2
4
6
8
Objective value
Subj
ectiv
e va
lue
Value Distortion Function
=0.6, =0.6
=1, =1
Reordered attack duration probability
Utility in the CPT-based Game
31
1
1 1
1
1
000
0),(
L
Kl
L
liiS
L
liiS
lSSLS
LSS
K
l
l
iiS
l
iiS
lSSSSS
CPTS
pwpwuvpwuv
pwpwuvpwuvyxU
1
1 1
1
1
000
0),(
L
Kl
L
liiA
L
liiA
lAALS
LAS
K
l
l
iiA
l
iiA
lAAAAA
CPTA
qwqwuvqwuv
qwqwuvqwuvyxU
CPT-value of losses
CPT-value of gains
LlAS zyxu 0/ ,,Reorderedin an ascending order
NE of the CPT-based Game
32
D. Xu, et al., "Cumulative Prospect Theoretic Study of a Cloud Storage Defense Game Against Advanced Persistent Threats," IEEE INFOCOM -BigSecurity, 2017.
Outline Background & motivations Game theoretic study on APT defense APT defense game model Prospect theory based study on APT Colonel Blotto game based study on APT Dynamic APT defense game
Reinforcement learning based APT defense solution
Conclusion
33
Colonel Blotto Game
34
Colonel Blotto game (CBG): Each Colonel has limited resources Powerful tool to study the strategic resource allocation in a
competitive environment
Related Work CBG-based phishing game in terms of the detect-and-
takedown defense against phishing attacks [Chia’11] CBG based anti-jamming communication game in terms of
the power allocation over multiple channels in cognitive radio networks [Wu’12]
CBG based anti-jamming communication game in heterogeneous Internet of Things [Labib’15]
CBG based spectrum allocation game among multiple network service providers [Hajimirsadeghi’16]
35
CBG-based CPU Allocation Game
36
Pure-strategy NE does not always exist of CBG
Data stored in storage device i at time k is Bi(k)
Data protection level: Normalized size of “safe” data that protected by the defender:
1
1 sgnD
k ki i ik
iR B M N
B
APT Defense Game with Mixed Strategy
37
Each player chooses its CPU allocation with randomness to fool the opponent
Expected utility of the defender/attacker at time k
1, , sgn
Dk k k
D A i i ii
U U E B M N
M x
N yx y x y
k, iPr =k
i jx M j k, iPr =k
i jy N j
, 1 ,0
=M
k ki j i D j S
x
x
, 1 ,0=
N
k ki j i D j S
y
y
Mixed strategy
NE of the CBG Based APT Game
38
600 750 900 1050 12000.75
0.8
0.85
0.9
0.95
Total number of defense CPUs, SM
Dat
a pr
otec
tion
leve
l, R
D=20D=40D=80
600 750 900 1050 120010
20
30
40
50
60
70
80
90
Total number of defense CPUs, SMU
tility
of t
he d
efen
der
D=20D=40D=80
D storage devices and SM defense CPUs against an APT attacker with 150 CPUs
M. Min, et al., "Defense Against Advanced Persistent Threats: A Colonel Blotto Game Approach," IEEE International Conference on Communications (ICC), 2017.
Outline Background & motivations Game theoretic study on APT defense APT defense game model Prospect theory based study on APT Colonel Blotto game based study on APT Dynamic APT defense game
Reinforcement learning based APT defense solution
Conclusion
39
Dynamic APT Defense Game Repeated interactions between the defender and the APT
attacker Choose the attack/scan interval/# CPUs Defender usually does not know the attack model
40
Dynamic APT defense game can be viewed as a Markov decision process (MDP)
APT Defender
Markov decision process
StateDefense strategy
Utility
New state
Outline Background & motivations Game theoretic study on APT defense APT defense game model Prospect theory based study on APT Colonel Blotto game based study on APT Dynamic APT defense game
Reinforcement learning based APT defense solution
Conclusion
42
Reinforcement Learning
43
Agent uses reinforcement learning such as Q-learning to choose the policy without knowing the system model in a dynamic APT defense game Achieve the optimal defense policy via trials-and-errors after a
long enough time in a finite-state MDP
Action
RewardLearner Current
state Environment
Next state
MDP
Previousstate
MDP
Defender applies Q-learning to choose the defense interval without knowing the attack and network model in the dynamic game Q-learning: A model-free reinforcement learning for an
agent to derive its optimal strategy via trials State: Previous attack duration Q-function: Expected long-term discounted utility of taking
a given action at a given state Use epsilon-greedy to make tradeoff between exploration
and exploitation in the learning process
Q-learning Based APT Detection
45
47
Number of non-zero attack interval levels is 5, attack cost is 0.4, defense gain is 0.6, and the objective weight of the attacker/defender is 0.8/1
Attack Rate
Utility of the Defender
48
L. Xiao, et al., "Cloud Storage Defense Against Advanced Persistent Threats: A Prospect Theoretic Study," IEEE Journal on Selected Areas in Communications, 2017.
Hotbooting PHC based APT Detection Hotbooting: Exploit experiences in similar scenarios to
initialize the learning parameters at the beginning period and save the useless explorations
PHC: An extension of Q-learning for the mixed-strategy game uses randomness to fool the APT attacker
Choose the detection interval according to a mixed-strategy table updated with
49
', if =arg max Q , '
, , , o.w.
1
k
xk k
x s xs x s x
x
x
APT Detection Performance
52
Attack cost is 0.51, APT detection gain is 0.4, objective weight and risk aversion coefficient of the security agent are 1, objective weight of the attacker is 0.3, risk seeking coefficient of the attacker is 0.6, and the reference utility of the attacker/security agent is 0
Data Protection Level
53 L. Xiao, et al., “Attacker-Centric View of a Detection Game Against Advanced Persistent Threats,” IEEE Tran. Mobile Computing, 2018.
DQN-based CPU Allocation Against APT
Deep Q-network (DQN)-based CPU allocation scheme uses deep learning techniques to compress the state space observed by the defender and further accelerate the learning speed
Use CNN to estimate the long-term expected reward of each CPU allocation policy for given state
CNN parameters are updated via minibatch using the stochastic gradient descent method
54
Data Protection Level
56
3 storage devices and 16 defense CPUs against an APT attacker with 4 attack CPUs, and the date size of each device changes every 1000 time slots
Conclusion Game theoretic study on APT defense provides insights on the
design of secure wireless networks PT-based APT defense game shows how the subjectivity of an APT attacker
under uncertainties impacts on the network security CBG-based APT defense game investigates how to efficiently allocate CPUs
to scan the storage devices to detect APTs
Reinforcement learning based APT defense strategies achieve the optimal detection performance in the dynamic APT defense games PHC-based defense uses random action selection to fool the attackers DQN-based defense uses CNN to compress the state space and thus accelerate
the learning speed and
Future work Improve the game model by incorporating more APT details Accelerate the learning speed of the RL-based APT defense strategies
58
Recommended