wiopt v7 [兼容模式] - Xiamen University

Liang XiaoXiamen University

International Symposium on Modeling and Optimization in Mobile, Ad Hoc and Wireless Networks (WiOpt)

ShanghaiMay 7, 2018

Game-theoretic Methods for Modeling and Defending Against Advanced Persistent Threats in Wireless Systems

Outline Background & motivations Game theoretic study on APT defense APT defense game model Prospect theory based study on APT Colonel Blotto game based study on APT Dynamic APT defense game

Reinforcement learning based APT defense solution

Conclusion

Slingshot APT used malware to spy on international targets from 2012 until Feb. 2018

Lazarus APT stole $81 million from Central Bank of Bangladesh in 2016

$31 million was stolen from Russian central bank in 2016 500 million Yahoo user accounts were leaked in 2016

APTs in Cyber Systems

3 [Threat Landscape Survey'17]

APT Attacks Target the objective Infiltrate malware onto the target

system Control by command-and-control

server (installing other malware) Spread the attack onto several

systems Exfiltrate target data from the

victim's network Cover tracks to maintain access to

the victim's system over a long time

Another APT Example Reconnaissance: Leverage information from a variety

of factors to understand their target Incursion: Use social engineering to deliver targeted

malware Discovery: Stay “low and slow”, map the defenses from

the inside and create a battle plan Capture information over an extended period, install

malware to secretly acquire data or disrupt operations Exfiltration: Send information to the attacker

Hackers cover their tracks to access the data for long time without being detected

APT attackers aim to steal data rather than to break the target network or crash all the data

APT attackers use multiple sophisticated attack methods Defense against APTs: Detect APTs as early as possible to minimize the damages

instead of keeping attackers out

Challenges to Detect APTs

Conclusion

APT attackers motivated by their hunger for money and secrets

APTs are stealthy, continuous, sophisticated and long-term Game theory can capture the long-term continuous

interaction between the APT attacker and the defender on the system resources Model the interactions between the APT attackers and

defenders Understand the fundamental tradeoffs between the security

risk and the APT defense cost Help design APT defense strategies

Game Theoretic Study on APTs

APT Defense Game APT game between an attacker choosing its attack path and a defender

selecting its best response according to the attack classification result [Fang’14]

Stealthy attacks game with asymmetric feedback model under limited attack or protection resources [Zhang’15]

Cyber-physical signaling game between a cloud and a mobile device, and a Flipit game between an APT attacker and a cloud defender [Pawlick’15]

Three-player Stackelberg game, in which a defender as the leader addresses both APTs and insider attacks [Feng’15]

Zero-sum matrix game against APT movements without being aware of the adversary model [Rass’17]

Dynamic APT game between an attacker choosing its attack resouces and a defender determining its prevention and recovery strength [Yang’17]9

FlipIt Game: Non-zero-sum game between an APT attacker and a defender competing for a cyber resource [Dijk’13] Attacker: whether to compromise the cyber system Defender: whether to restore the cyber system Goal: maximize its utility that increases with the resource

control time and decreases with the attack/defense cost

FlipIt Game

APT Defense Game Model APT attacker chooses its time interval to launch APTs Defender chooses its time interval to scan the devices Random duration to complete an APT attack Safe time for the data stored on storage device i:

1,/)(min iii xzy

APT Defense Game with Pure Strategy Normalized attack/defense interval over each device Random attack durations zi quantized into L non-zero

levels, each with probability Limited overall attack/defense computation resources Expected utilities of the defender and the attacker

Both players choose their policies to maximize their own expected utilities

EUTD Gx

LxlLyPU

1 01,min),( yx

EUTA Cxy

LxlLyPU

1,min),( yx

Gain of a longer scan interval

Attack cost

, 0ilP l L

Defense Game with Mixed Strategy Defender chooses its detection interval quantized into M

levels with a distribution Attacker quantizes its non-zero attack interval into N levels

and chooses the attack distribution Known attack duration z Expected utilities of the defender and APT attacker:

Mmmp 1][p

Nnnq 0][q

MmzNnqpU

/min),(

NE of the Game Nash equilibrium (NE) of a game: No player can

increase its expected utility by unilaterally leaving the NE strategy Based on the expected utility theory (EUT)

NE of the APT defense game with pure-strategy:

arg max ( , )

0 1, 0 1, 1

i x i yi i

x T y T

x y i S

Conclusion

PT-based APT Defense Game EUT-based study on APT defense deviates from real-

life decision-making due to the subjectivity of attackers under uncertainties

Prospect theoretic study of the cloud storage defense against subjective APT attackers Model the decision making of a subjective attacker under

uncertainties Pure-strategy: Uncertain time to hack a storage device Mixed-strategy: Uncertain action of the opponent

Prospect Theory

Expected utility theory (EUT) cannot explain the deviations due to end-user subjectivity

Prospect theory (PT) [Kahneman,Tversky’79] as a Nobel prize winning theory explains the deviations in monetary transactions: People usually over-weigh low probability outcomes and

underweigh their outcomes with high probabilities Loss looms larger than gains

Prospect theory has recently been applied in many contexts: Social sciences [Gao’10] [Harrison’09] [Tanaka’16] Communication networks [Li’14] [Yu’14] [Yang’14] [Lee’15] Smart energy management [Wang’14] [Xiao’14]

Allais’s Paradox

High Probability

GainLose

95% chance to win $10,000 100% chance to win $9,499

0.95*10000>9499Low Probability

5% chance to win $10,000 100% chance to win $501

0.05*10000<501

High Probability 95% chance to lose $10,000 100% chance to lose $9,499

0.95*10000>9499Low Probability

5% chance to lose $10,000 100% chance to lose $501

0.05*10000<501

Fear of disappointment. Risk averse.

Hope to avoid loss. Risk seeking

Fear of large loss. Risk averse

Hope of large gain.Risk seeking

Probability weighting function models the subjectivity of a player Subjective probability for a player to weigh the outcome

with a probability p S-shaped and asymmetrical, ranging in [0,1] Objective weight decreases with the player’s subjective

evaluation distortion Prelec function [Prelec’98]:

Probability Weighting Functions

w p exp ln p

－α=1－α=0.5

Related Work PT-based channel access game between two subjective end-

users in wireless network [Li’12] Wireless operator invests spectrum to users under uncertain

spectrum supply using PT [Yu’14] PT-based random access game between two users choosing

the transmission probabilities on a radio channel [Li’14] Stackelberg game between the SP offering the bandwidth

and subjective end-users to choose services [Yang’14] PT-based microgrids energy trading game in smart grids

[Xiao’15]

PT-based ATP Detection Game PT-based APT detection game: Model the decision making

of a subjective attacker under uncertainties Uncertain attack durations Uncertain defense policies

Subjective APT Game Model Subjective storage defense game between the attacker

and defender with pure strategies under uncertain attack duration zi The attacker weighs the attack outcome with subjective

probabilities , and PT-based utilities:

PTD Gx

LxlLyPwU

1 01,min),( yx

PTA Cxy

LxlLyPwU

1,min),( yx

, 0iA lw P l L Si 1

NE of the PT-based APT Defense Game Best response of the player in terms of the PT-based utility, if

the opponent uses the NE strategy

argmax ( , )

0 1,0 1, 1

i x i yi i

x T y T

x y i S

Example of the NE of the PT-based Game

The NE of the subjective APT game with S=1 and L=2 is given by

* *1 0 1

(0.5,0), exp ln , C exp ln

( , ) (1,0), exp ln , C exp ln 0.5exp ln

(1,1), C exp ln 0.5exp ln

x y G P P P

NE of the PT-based APT defense Game

PT-based APT Game with Mixed-Strategy Each player holds subjective view on the opponent’s strategy PT-based utilities:

NE of the game with mixed-strategy:

MmzNnqpwU

MmzNnqwpU

/min),(

D D k D Nk Nm M n N

A A k A Mk Mm M n N

m nu w qM N

m nu w pM N

Lagrange parameters

is a η-dimensional all 1 column vector1

If the NE of the

subjective game with mixed-strategy and two interval

levels is given by

1,10,10,5.01,5.0and1

1,5.01,10,10,5.0

0ln1ln

1,5.01,10,10,5.0ln

0ln1ln1,10,1

0,5.01,5.0ln

ppuuuu

Example of the NE of the PT-based Game with Mixed- Strategy

ProofKarush-Kuhn-Tucker (KKT) optimality conditions:

Then apply the complementary slackness to obtain:

1,0,0,0

)1(),(

MmpppL

1,0)(),(

NE of the PT-based APT Defense Game with Mixed Strategy

D. Xu, et al., "Prospect Theoretic Study of Cloud Storage Defense Against Advanced Persistent Threats," IEEE Global Commun. Conf. (GLOBECOM), 2016.

Value distortion function models the framing effect of the subjective decisions Evaluate alternatives gains and losses in terms of the frame

of reference, Steeper for losses than for gains if loss aversion coefficient Risk aversion/seeking coefficient decreases with the player’s

subjective value evaluation distortion

Value distortion function:

Value Distortion Functions

o.w.,)()(,)()(

uUuvUuUuuv

-8 -6 -4 -2 0 2 4 6 8-8

Objective value

Value Distortion Function

=0.6, =0.6

=1, =1

Reordered attack duration probability

Utility in the CPT-based Game

lSSSSS

pwpwuvpwuv

pwpwuvpwuvyxU

lAAAAA

qwqwuvqwuv

qwqwuvqwuvyxU

CPT-value of losses

CPT-value of gains

LlAS zyxu 0/ ,,Reorderedin an ascending order

NE of the CPT-based Game

D. Xu, et al., "Cumulative Prospect Theoretic Study of a Cloud Storage Defense Game Against Advanced Persistent Threats," IEEE INFOCOM -BigSecurity, 2017.

Conclusion

Colonel Blotto Game

Colonel Blotto game (CBG): Each Colonel has limited resources Powerful tool to study the strategic resource allocation in a

competitive environment

Related Work CBG-based phishing game in terms of the detect-and-

takedown defense against phishing attacks [Chia’11] CBG based anti-jamming communication game in terms of

the power allocation over multiple channels in cognitive radio networks [Wu’12]

CBG based anti-jamming communication game in heterogeneous Internet of Things [Labib’15]

CBG based spectrum allocation game among multiple network service providers [Hajimirsadeghi’16]

CBG-based CPU Allocation Game

Pure-strategy NE does not always exist of CBG

Data stored in storage device i at time k is Bi(k)

Data protection level: Normalized size of “safe” data that protected by the defender:

1 sgnD

k ki i ik

iR B M N

APT Defense Game with Mixed Strategy

Each player chooses its CPU allocation with randomness to fool the opponent

Expected utility of the defender/attacker at time k

1, , sgn

Dk k k

D A i i ii

U U E B M N

N yx y x y

k, iPr =k

i jx M j k, iPr =k

i jy N j

, 1 ,0

k ki j i D j S

, 1 ,0=

k ki j i D j S

Mixed strategy

NE of the CBG Based APT Game

600 750 900 1050 12000.75

Total number of defense CPUs, SM

D=20D=40D=80

600 750 900 1050 120010

Total number of defense CPUs, SMU

tility

D=20D=40D=80

D storage devices and SM defense CPUs against an APT attacker with 150 CPUs

M. Min, et al., "Defense Against Advanced Persistent Threats: A Colonel Blotto Game Approach," IEEE International Conference on Communications (ICC), 2017.

Conclusion

Dynamic APT Defense Game Repeated interactions between the defender and the APT

attacker Choose the attack/scan interval/# CPUs Defender usually does not know the attack model

Dynamic APT defense game can be viewed as a Markov decision process (MDP)

APT Defender

Markov decision process

StateDefense strategy

Utility

New state

Conclusion

Reinforcement Learning

Agent uses reinforcement learning such as Q-learning to choose the policy without knowing the system model in a dynamic APT defense game Achieve the optimal defense policy via trials-and-errors after a

long enough time in a finite-state MDP

Action

RewardLearner Current

state Environment

Next state

Previousstate

Reinforcement Learning Algorithms

Defender applies Q-learning to choose the defense interval without knowing the attack and network model in the dynamic game Q-learning: A model-free reinforcement learning for an

agent to derive its optimal strategy via trials State: Previous attack duration Q-function: Expected long-term discounted utility of taking

a given action at a given state Use epsilon-greedy to make tradeoff between exploration

and exploitation in the learning process

Q-learning Based APT Detection

Q-Learning Based APT Detection

Number of non-zero attack interval levels is 5, attack cost is 0.4, defense gain is 0.6, and the objective weight of the attacker/defender is 0.8/1

Attack Rate

Utility of the Defender

L. Xiao, et al., "Cloud Storage Defense Against Advanced Persistent Threats: A Prospect Theoretic Study," IEEE Journal on Selected Areas in Communications, 2017.

Hotbooting PHC based APT Detection Hotbooting: Exploit experiences in similar scenarios to

initialize the learning parameters at the beginning period and save the useless explorations

PHC: An extension of Q-learning for the mixed-strategy game uses randomness to fool the APT attacker

Choose the detection interval according to a mixed-strategy table updated with

', if =arg max Q , '

, , , o.w.

x s xs x s x

H-PHC Based APT Detection Interval

PHC Based CPU Allocation Against APT

APT Detection Performance

Attack cost is 0.51, APT detection gain is 0.4, objective weight and risk aversion coefficient of the security agent are 1, objective weight of the attacker is 0.3, risk seeking coefficient of the attacker is 0.6, and the reference utility of the attacker/security agent is 0

Data Protection Level

53 L. Xiao, et al., “Attacker-Centric View of a Detection Game Against Advanced Persistent Threats,” IEEE Tran. Mobile Computing, 2018.

DQN-based CPU Allocation Against APT

Deep Q-network (DQN)-based CPU allocation scheme uses deep learning techniques to compress the state space observed by the defender and further accelerate the learning speed

Use CNN to estimate the long-term expected reward of each CPU allocation policy for given state

CNN parameters are updated via minibatch using the stochastic gradient descent method

Data Protection Level

3 storage devices and 16 defense CPUs against an APT attacker with 4 attack CPUs, and the date size of each device changes every 1000 time slots

Utility of the Defender

Conclusion Game theoretic study on APT defense provides insights on the

design of secure wireless networks PT-based APT defense game shows how the subjectivity of an APT attacker

under uncertainties impacts on the network security CBG-based APT defense game investigates how to efficiently allocate CPUs

to scan the storage devices to detect APTs

Reinforcement learning based APT defense strategies achieve the optimal detection performance in the dynamic APT defense games PHC-based defense uses random action selection to fool the attackers DQN-based defense uses CNN to compress the state space and thus accelerate

the learning speed and

Future work Improve the game model by incorporating more APT details Accelerate the learning speed of the RL-based APT defense strategies

Questions?

lxiao@xmu.edu.cnSlides on

http://lxiao.xmu.edu.cn

wiopt v7 [兼容模式] - Xiamen University

Documents

教育課程等の概要 - cdn.aoyama.ac.2xx.jp · 2 兼10 1前 1 兼2 1後 1 兼2 1前 1 兼22 1後 1 兼22 1前 1 兼23 1後 1 兼24 1前 1 兼13 1後 1 兼14

NanQiang Lecture Xiamen University

LIMNOLOGY - Xiamen University

Xiamen Battlefield-Site Guide Book

The Fifth Xiamen Symposium on Marine Environmental Sciences Call for... · - V The Fifth Xiamen Symposium on Marine Environmental Sciences January 11th-14th, 2021 Xiamen, China Call

8D7N XIAMEN & HAKKA CULTURE TOUR CXH08A)web.mayflower.com.my/holidays/SuperValuePackages/China/CXH08A.pdf · 8D7N XIAMEN & HAKKA CULTURE TOUR (CXH08A) XIAMEN, YONGDING TULOU, MEIXIAN,

Graduate Handbook - Xiamen University

Xiamen Mecdical University - moksh16.com · Xiamen Mecdical University Xiamen University (XMU), established in 1921 by renowned patriotic overseas Chinese leader Mr. Tan Kah Kee,

福山市社会福祉協議会嫌楽楽楽＊兼嫌＊＊＊挙兼紫＊＊兼嫌嫌幾＊＊＊＊兼嫌＊＊＊＊＊＊幾楽＊＊＊＊米兼嫌楽

XIAMEN ZETTLER ELECTRONICS CO., LTD. - …akizukidenshi.com/download/ds/Xiamen/ACM2004D-NLW-BBW.pdf · zettler displays xiamen zettler electronics co., ltd. specifications for liquid

RODADA DE NEGÓCIOS APEX TUPIARA · xiamen naikos industrial k55a xiamen star electrical equipment k51 xiamen tenia lighting & electrical c91 xinhe electronic equipment manufacturing

Xiamen Presentation

XIAMEN AMOTEC DISPLAY CO.,LTD

Xiamen Eufine Machinery Co

Out&About Lucca-Xiamen

BankFocus - Xiamen University

See U in Xiamen

2019 - Xiamen University

Dear Valued Customers, Thanks for choosing Hamburg Süd …...Xiamen United International Shipping Agency Co., Ltd.(Xiamen Unisco) is an unique agency for Hamburg Sud Xiamen office

Catalogue of XIAMEN BETOPPER