58
Sta$c Analysis Dataflow Analysis

Sta$c&Analysis& - Northwestern University

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sta$c&Analysis& - Northwestern University

Sta$c  Analysis                              Dataflow  Analysis

Page 2: Sta$c&Analysis& - Northwestern University

Roadmap

•  Overview.  •  Four  Analysis  Examples.  •  Analysis  Framework  –  Soot.  •  Theore>cal  Abstrac>on  of  Dataflow  Analysis.  •  Inter-­‐procedure  Analysis.  •  Taint  Analysis.  

Page 3: Sta$c&Analysis& - Northwestern University

Overview

•  Sta>c  analysis  is  a  program  analysis  technique  performed  without  actually  execu>ng  programs.  

•  Data  flow  analysis  is  a  process  of  deriving  informa>on  about  the  run  >me  behavior  of  a  program.    

•  Usage:  compiler,  IDE  and  security.  

Page 4: Sta$c&Analysis& - Northwestern University

SSA •  Requires  that  each  variable  is  assigned  exactly  once.  

•  Def-­‐use  chain:  – Def-­‐use  chains  are  used  to  propagate  data  flow  informa>on.  

–  The  analysis  algorithm  takes  >me  propor>onal  to  the  product  of  the  total  number  of  def-­‐use  edges    

•  Benefits:    – Data  flow  analysis  could  be  easier  and  faster.  –  Reduce  the  number  of  def-­‐use  chains.  (m*n  vs  m+n)  

y  :=  1  y  :=  2  x  :=  y

y1  :=  1  y2  :=  2  x1  :=  y2

Page 5: Sta$c&Analysis& - Northwestern University
Page 6: Sta$c&Analysis& - Northwestern University
Page 7: Sta$c&Analysis& - Northwestern University

Control  Flow  Graph  (CFG)

•  A  control  flow  graph  is  a  representa>on  of  a  program  that  makes  certain  analyses  (including  dataflow  analyses)  easier.  

•  Usually  built  on  Intermediate  representa>on:    –  Single  sta>c  assignment  (SSA)  form.  

•  Statements  may  be  –  Assignments:  x  :=  y  or  x  :=  y  op  z  or  x  :=  op  y  –  Branches:  goto  L  or  if  b  then  goto  L  

•  A  directed  graph  where  –  Each  node  represents  a  statement  –  Edges  represent  control  flow  

Page 8: Sta$c&Analysis& - Northwestern University

IN  point

OUT  point

Page 9: Sta$c&Analysis& - Northwestern University

Roadmap

•  Overview.  •  Four  Analysis  Examples.  •  Analysis  Framework  –  Soot.  •  Theore>cal  Abstrac>on  of  Dataflow  Analysis.  •  Inter-­‐procedure  Analysis.  •  Taint  Analysis.  

Page 10: Sta$c&Analysis& - Northwestern University

Example

•  Available  expressions  •  Reaching  defini>ons    •  Live  variables    •  Very  busy  expressions    

•  Data-­‐flow-­‐analysis-­‐example.pdf  

Page 11: Sta$c&Analysis& - Northwestern University

Blackboard

Page 12: Sta$c&Analysis& - Northwestern University

X  (prerequisite  knowledge) x

a,  x,  y a,  x,  y

a,  x,  b a,  x,  b

a,  b

a,  x,  y

a,  b,  y a,  b,  y

a,  b,  y a,  b,  x,  y

a,  b,  x,  y

a,  b,  x,  y

a,  b,  x

a,  b,  x,  y

a,  b,  y

1.

2.

3.

4.

5.

6.

Page 13: Sta$c&Analysis& - Northwestern University

Forward  Must  Dataflow  Algorithm  

Full  set

Page 14: Sta$c&Analysis& - Northwestern University

Roadmap

•  Overview.  •  Four  Analysis  Examples.  •  Analysis  Framework  –  Soot.  •  Theore>cal  Abstrac>on  of  Dataflow  Analysis.  •  Inter-­‐procedure  Analysis.  •  Taint  Analysis.  

Page 15: Sta$c&Analysis& - Northwestern University

Use  Framework  to  Implement  Analysis

•  Soot  is  a  framework  to  analyze  and  op>mize  Java  or  Android  programs.

Page 16: Sta$c&Analysis& - Northwestern University

Four  Steps  to  Use  Soot

•  Forward  or  backward?    •  Decide  what  you  are  approxima>ng.  What  is  the  domain’s  confluence  operator?  (Union  or  Intersec>on)  

•  Write  equa>on  for  each  kind  of  IR  statement.    •  State  the  star>ng  approxima>on.  (Ini>al  value  of  each  set)  

Page 17: Sta$c&Analysis& - Northwestern University

HOWTO:  Implemen>ng  Soot  Flow  Analysis

•  Subclass  ForwardFlowAnalysis  or  BackwardFlowAnalysis.  

•  Implement  merge(),  copy()    •  Implement  flow  func>on:  flowThrough()    •  Implement  ini>al  values:  newIni>alFlow()  and  entryIni>alFlow()    

•  Implement  constructor  (it  must  call  doAnalysis())    

Page 18: Sta$c&Analysis& - Northwestern University

Soot  Example:  Live  Variable

Page 19: Sta$c&Analysis& - Northwestern University

Step1.  Forward  or  Backward

Page 20: Sta$c&Analysis& - Northwestern University

Step2.  Abstrac>on  Domain

Page 21: Sta$c&Analysis& - Northwestern University

Step  3.  Implemen>ng  an  Abstrac>on

Page 22: Sta$c&Analysis& - Northwestern University

Step  3.  Implemen>ng  an  Abstrac>on

Page 23: Sta$c&Analysis& - Northwestern University

Implemen>ng  Copy

Page 24: Sta$c&Analysis& - Northwestern University

Implemen>ng  Merge

Page 25: Sta$c&Analysis& - Northwestern University

Flow  Equa>ons

Page 26: Sta$c&Analysis& - Northwestern University

Implemen>ng  Flow  Func>on:  Cas>ng  (1)

Page 27: Sta$c&Analysis& - Northwestern University

Implemen>ng  Flow  Func>on:  Copying  (2)

Page 28: Sta$c&Analysis& - Northwestern University

Implemen>ng  Flow  Func>on:  Removing  Kills  (3)

Page 29: Sta$c&Analysis& - Northwestern University

Implemen>ng  Flow  Func>on:    Adding  Gens  (4)

Page 30: Sta$c&Analysis& - Northwestern University

Step  4.  Ini>al  Value.

Page 31: Sta$c&Analysis& - Northwestern University

Step  5.  Implement  Constructor

Page 32: Sta$c&Analysis& - Northwestern University

Enjoy:  Flow  Analysis  Results  

Page 33: Sta$c&Analysis& - Northwestern University

Roadmap

•  Overview.  •  Four  Analysis  Examples.  •  Analysis  Framework  –  Soot.  •  Theore$cal  Abstrac$on  of  Dataflow  Analysis.  •  Inter-­‐procedure  Analysis.  •  Taint  Analysis.  

Page 34: Sta$c&Analysis& - Northwestern University

Theore>cal  Abstrac>on  of  Dataflow  Analysis

•  Model  dataflow  analysis  is  important:  – Whether  the  algorithm  will  terminate.  – Which  analysis  converges  faster.  

•  There  is  an  order  between  the  values  a  data  flow  variable  takes  in  two  consecu>ve  itera>ons.  

•  A  general  way  to  express  an  order  between  objects  is  to  embed  them  in  a  mathema>cal  structure  called  a  lafce.    

Page 35: Sta$c&Analysis& - Northwestern University

Lafce •  A  par$al  order  ⊑  on  a  set  S  is  a  rela>on  over  S  ×  S  that  is  –  Reflexive.  For  all  elements  x∈S  :  x⊑x.  –  Transi>ve.  For  all  elements  x,y,z  ∈  S  :  x⊑y  and  y⊑z  implies  x⊑z.  

– An>-­‐symmetric.  For  all  elements  x,y  ∈  S  :  x⊑y  and  y⊑x  implies  x  =  y.  

•  A  par$ally  ordered  set,  denoted  by  (S  ,  ⊑  ),  is  a  set  S  with  a  par>al  order  ⊑.  

•  LaLce:  a  par>ally  ordered  set  with  unique  least  upper  bounds  and  greatest  lower  bounds.

Page 36: Sta$c&Analysis& - Northwestern University

Modeling  Data  Flow  Values  Using  Lafces

The  lafce  for  data  flow  values  in  live  variables  analysis  

The  rela>on  ⊑  can  be  interpreted  as  “a  conserva>ve  (safe)  approxima>on  of”.  

Page 37: Sta$c&Analysis& - Northwestern University

Modeling  Flow  Func$ons  

Page 38: Sta$c&Analysis& - Northwestern University

Two  important  proper>es  of  flow  func>ons

Determine  if  the  analysis  could  terminate.

Determine  if  the  analysis  could  run  faster.

Page 39: Sta$c&Analysis& - Northwestern University

•  Distribu>ve  problems:  – Live  variables  – Available  expressions    – Reaching  defini>ons  – Very  busy  expressions  

•  Non-­‐distribu>ve  problem:

A:  {x=2,  y=1}

B:  {x=1,  y=2} FlowFunc>on(A  Π  B)  =  {A,B,C  unknown}

FlowFunc>on(A)  Π  FlowFunc>on  (B)  =    {A,B  Unknow,  Z=3}

Page 40: Sta$c&Analysis& - Northwestern University

Roadmap

•  Overview.  •  Example.  •  Theore>cal  Abstrac>on  of  Dataflow  Analysis.  •  Inter-­‐procedure  Analysis.  •  Taint  Analysis.  •  Analysis  Framework  –  Soot.  

Page 41: Sta$c&Analysis& - Northwestern University

Inter-­‐procedure  Analysis

Page 42: Sta$c&Analysis& - Northwestern University

X:=7

a:=7

a:=7,  y:=0

r:=7

x:=7

a:=    (7  Π  17)

a:=Not  Constant  y:=Not  Constant

z:=Not  Constant  

Top  Value:  UNDEF  Bopom  Value:  NOT  Constant

Page 43: Sta$c&Analysis& - Northwestern University

A  quick  fix

Page 44: Sta$c&Analysis& - Northwestern University

Context-­‐based  Inter-­‐procedure  analysis

•  Solu>on:  make  a  finite  number  of  copies  •  Use  context  informa>on  to  determine  when  to  share  a  copy  

•  Choice  of  what  to  use  for  context  will  produce  different  tradeoffs  between  precision  and  scalability  

•  Common  choice:  –   Call  site  –   Parameter  value

Page 45: Sta$c&Analysis& - Northwestern University

Based  on  Call  Stack  Depth  1

Page 46: Sta$c&Analysis& - Northwestern University

Based  on  Call  Stack  Depth  2

Page 47: Sta$c&Analysis& - Northwestern University

Based  on  Parameter  Value

Page 48: Sta$c&Analysis& - Northwestern University

Roadmap

•  Overview.  •  Example.  •  Theore>cal  Abstrac>on  of  Dataflow  Analysis.  •  Inter-­‐procedure  Analysis.  •  Taint  Analysis.  •  Analysis  Framework  –  Soot.  

Page 49: Sta$c&Analysis& - Northwestern University

Taint  Analysis •  Follow  any  applica>on  inside  a  debugger  and  you  will  see  that  data  informa>on  is  being  copied  and  modified  all  the  >me.  In  another  words,  informa>on  is  always  moving.  

•  Taint  analysis  can  be  seen  as  a  form  of  Informa>on  Flow  Analysis.  –  Insert  some  kind  of  tag  or  label  for  data  we  are  interested  in.  (taint  the  data)  

–  Track  the  influence  of  the  tainted  object  along  the  execu>on  of  the  program.  

–  Taint  relevant  data.  – Obverse  if  it  flows  to  sensi>ve  func>ons  (sink).

Page 50: Sta$c&Analysis& - Northwestern University

Taint  Analysis •  Two  usage  in  security:  – Finding  informa$on  leakage.  – Finding  program  vulnerability.  

•  For  informa>on  leakage:  –  If  a  data  (variable)  contains  user  secrets  (e.g.,  loca>on,  contacts),  we  will  taint  such  data.  

– Taint  the  variables  whose  data  depend  on  tainted  value.  (e.g.,  a  :=  b  +  x)  

– Observe  if  the  tainted  data  will  flow  to  func>ons  that  might  send  data  to  other  places.

Page 51: Sta$c&Analysis& - Northwestern University
Page 52: Sta$c&Analysis& - Northwestern University

Taint  Analysis

•  Two  usage  in  security:  – Finding  informa>on  leakage.  – Finding  program  vulnerability  (code  injec$on).  

•  Applica>on  vulnerability:  – A  lot  of  vulnerabili>es  are  caused  by  unchecked  input  from  user  (apack)  sent  to  sensi>ve  func>ons.

Page 53: Sta$c&Analysis& - Northwestern University

“<script>  alert(1)</script>”

XSS  vulnerability

Page 54: Sta$c&Analysis& - Northwestern University

SELECT * FROM `login` WHERE `user`= ‘’ OR ‘a’=‘a’ AND `pass`=‘’ OR ‘a’=‘a’

Page 55: Sta$c&Analysis& - Northwestern University

Buffer  Overflow  Vulnerability

Page 56: Sta$c&Analysis& - Northwestern University

Taint  Analysis •  Two  usage  in  security:  –  Finding  informa>on  leakage.  –  Finding  program  vulnerability  (code  injec$on).  

•  Applica>on  vulnerability:  – A  lot  of  vulnerabili>es  are  caused  by  unchecked  input  from  user  (apack)  sent  to  sensi>ve  func>ons.  

–  If  the  source  of  a  object  X’s  value  is  untrusted,  we  say  X  is  tainted.  

–  Taint  the  variables  whose  data  depend  on  tainted  value.  (e.g.,  a  :=  b  +  x)  

– Observe  if  the  tainted  data  will  flow  to  dangerous  func>ons  that  might  lead  to  execu>on  its  parameters.

Page 57: Sta$c&Analysis& - Northwestern University

Taint  Analysis  Works •  Android  App  Informa>on  Leakage:  – FlowDroid.  

•  JavaScript:  Firefox  Extension  Vulnerability:  – Bandhakavi,  Sruthi,  et  al.  "VEX:  Vefng  browser  extensions  for  security  vulnerabili>es."  Usenix  Security.  2010.  

•  Php:  Web  Applica>on  Vulnerability:  –  Jovanovic,  Nenad,  Christopher  Kruegel,  and  Engin  Kirda.  "Pixy:  A  sta>c  analysis  tool  for  detec>ng  web  applica>on  vulnerabili>es.”  Oakland,  2006  

Page 58: Sta$c&Analysis& - Northwestern University

References •  Soot  Tutorial:  hpps://github.com/Sable/soot/wiki/Tutorials  

•  Interprocedural  Data  Flow  Analysis  in  Soot  using  Value  Contexts:  hpps://arxiv.org/pdf/1304.6274.pdf  

•  Harvard  Advanced  Programming  Language:  hpp://www.seas.harvard.edu/courses/cs252/2011sp/  

•  Textbool:  Data  Flow  Analysis:  Theory  and  Prac>ce:    hpps://www.amazon.com/Data-­‐Flow-­‐Analysis-­‐Theory-­‐Prac>ce/dp/0849328802  

•  Course:  Professor  Finddler’s  programming  Languages  seminar.  

•  Course:  Professor  Campanoni’s  code  analysis  and  transforma>on.