62
Prac%cal A)acks Against Encrypted VoIP Communica%ons HITBSECCONF2013: Malaysia Shaun Colley & Dominic Chell @domchell @mdseclabs

Practical Attacks Against Encrypted VoIP Communications

Embed Size (px)

DESCRIPTION

The slides from MDSec's presentation at HackInTheBox KUL 2013. The presentation describes attacks that can be used to deduce spoken conversations from encrypted VoIP communications. The presentation uses Skype as a case study.

Citation preview

Page 1: Practical Attacks Against Encrypted VoIP Communications

Prac%cal  A)acks  Against  Encrypted  VoIP  Communica%ons  

HITBSECCONF2013:  Malaysia  Shaun  Colley  &  Dominic  Chell  

 @domchell  @mdseclabs  

Page 2: Practical Attacks Against Encrypted VoIP Communications

Agenda  

•  This  is  a  talk  about  traffic  analysis  and  paHern  matching  

•  VoIP  background  •  NLP  techniques  •  StaNsNcal  modeling  •  Case  studies  aka  “the  cool  stuff”  

Page 3: Practical Attacks Against Encrypted VoIP Communications

•  VoIP  is  a  popular  replacement  for  tradiNonal  copper-­‐wire  telephone  systems  

•  Bandwidth  efficient  and  low  cost  •  Privacy  has  become  an  increasing  concern  •  Generally  accepted  that  encrypNon  should  be  used  for  end-­‐to-­‐end  security  

•  But  even  if  it’s  encrypted,  is  it  secure?  

Introduc%on  

Page 4: Practical Attacks Against Encrypted VoIP Communications

Why?  

•  Widespread  accusaNons  of  wiretapping  •  Leaked  documents  allegedly  claim  NSA  &  GCHQ  have  some  “capability”  against  encrypted  VoIP  

•  “The  fact  that  GCHQ  or  a  2nd  Party  partner  has  a  capability  against  a  specific  the  encrypted  used  in  a  class  or  type  of  network  communica@ons  technology.  For  example,  VPNs,  IPSec,  TLS/SSL,  HTTPS,  SSH,  encrypted  chat,  encrypted  VoIP”.  

Page 5: Practical Attacks Against Encrypted VoIP Communications

•  LiHle  work  has  been  done  by  the  security  community  

•  Some  interesNng  academic  research  –  Uncovering  Spoken  Phrases  in  Encrypted  Voice  over  IP  CommunicaNons:  Wright,  Ballard,  Coull,  Monrose,  Masson  

–  Uncovering  Spoken  Phrases  in  Encrypted  VoIP  ConversaNons:  Doychev,  Feld,  Eckhardt,  Neumann  

•  Not  widely  publicised  •  No  proof  of  concepts  

Previous  Work  

Page 6: Practical Attacks Against Encrypted VoIP Communications

Background:  VoIP  

Page 7: Practical Attacks Against Encrypted VoIP Communications

•  Similar  to  tradiNonal  digital  telephony,  VoIP  involves  signalling,  session  iniNalisaNon  and  setup  as  well  as  encoding  of  the  voice  signal  

•  Separated  in  to  two  channels  that  perform  these  acNons:  – Control  channel  – Data  channel  

VoIP  Communica%ons  

Page 8: Practical Attacks Against Encrypted VoIP Communications

•  Operates  at  the  applicaNon-­‐layer  •  Handles  call  setup,  terminaNon  and  other  essenNal  aspects  of  the  call  

•  Uses  a  signalling  protocol  such  as:  – Session  IniNaNon  Protocol  (SIP)  – Extensible  Messaging  and  Presence  Protocol  (XMPP)  

– H.323  – Skype  

Control  Channel  

Page 9: Practical Attacks Against Encrypted VoIP Communications

Control  Channel  

•  Handles  sensiNve  call  data  such  as  source  and  desNnaNon  endpoints,  and  can  be  used  for  modifying  exisNng  calls  

•  Typically  protected  with  encrypNon,  for  example  SIPS  which  adds  TLS  

•  Ocen  used  to  establish  the  the  direct  data  connecNon  for  the  voice  traffic  in  the  data  channel  

Page 10: Practical Attacks Against Encrypted VoIP Communications

•  The  primary  focus  of  our  research  •  Used  to  transmit  encoded  and  compressed  voice  data  

•  Typically  over  UDP  •  Voice  data  is  transported  using  a  transport  protocol  such  as  RTP  

 

Data  Channels  

Page 11: Practical Attacks Against Encrypted VoIP Communications

•  Commonplace  for  VoIP  implementaNons  to  encrypt  the  data  flow  for  confidenNality  

•  A  common  implementaNon  is  Secure  Real-­‐Time  Transport  Protocol  (SRTP)  

•  By  default  will  preserve  the  original  RTP  payload  size  

•  “None  of  the  pre-­‐defined  encryp@on  transforms  uses  any  padding;  for  these,  the  RTP  and  SRTP  payload  sizes  match  exactly.”  

Data  Channels  

Page 12: Practical Attacks Against Encrypted VoIP Communications

Background:  Codecs  

Page 13: Practical Attacks Against Encrypted VoIP Communications

•  Used  to  convert  the  analogue  voice  signal  in  to  a  digitally  encoded  and  compressed  representaNon  

•  Codecs  strike  a  balance  between  bandwidth  limitaNons  and  voice  quality  

•  We’re  mostly  interested  in  Variable  Bit  Rate  (VBR)  codecs  

Codecs  

Page 14: Practical Attacks Against Encrypted VoIP Communications

•  The  codec  can  dynamically  modify  the  bitrate  of  the  transmiHed  stream  

•  Codecs  like  Speex  will  encode  sounds  at  different  bitrates  

•  For  example,  fricaNves  may  be  encoded  at    lower  bitrates  than  vowels  

 

Variable  Bitrate  Codecs  

Page 15: Practical Attacks Against Encrypted VoIP Communications
Page 16: Practical Attacks Against Encrypted VoIP Communications

•  The  primary  benefit  from  VBR  is  a  significantly  beHer  quality-­‐to-­‐bandwidth  raNo  compared  to  CBR  

•  Desirable  in  low  bandwidth  environments  – Cellular  – Slow  WiFi  

Variable  Bitrate  Codecs  

Page 17: Practical Attacks Against Encrypted VoIP Communications

Background:  NLP  and  Sta%s%cal  Analysis  

Page 18: Practical Attacks Against Encrypted VoIP Communications

•  Research  techniques  borrowed  from  NLP  and  bioinformaNcs  

•  Primarily  the  use  of:  – Profile  Hidden  Markov  Models  – Dynamic  Time  Warping  

Natural  Language  Processing  

Page 19: Practical Attacks Against Encrypted VoIP Communications

•  StaNsNcal  model  that  assigns  probabiliNes  to  sequences  of  symbols  

•  TransiNons  from  Begin  state  (B)  to  End  state  (E)  

•  Moves  from  state  to  state  randomly  but  in  line  with  transiNon  distribuNons  

•  TransiNons  occur  independently  of  any  previous  choices  

Hidden  Markov  Models  

Page 20: Practical Attacks Against Encrypted VoIP Communications

•  The  model  will  conNnue  to  move  between  states  and  output  symbols  unNl  the  End  state  is  reached  

•  The  emiHed  symbols  consNtute  the  sequence  

Hidden  Markov  Models  

Image  from  hHp://isabel-­‐drost.de/hadoop/slides/HMM.pdf  

Page 21: Practical Attacks Against Encrypted VoIP Communications

•  A  number  of  possible  state  paths  from  B  to  E  •  Best  path  is  the  most  likely  path  •  The  Viterbi  algorithm  can  be  used  to  discover  the  most  probable  path  

•  Viterbi,  Forward  and  Backward  algorithms  can  all  be  used  to  determine  probability  that  a  model  produced  an  output  sequence  

Hidden  Markov  Models  

Page 22: Practical Attacks Against Encrypted VoIP Communications

•  The  model  can  be  “trained”  by  a  collecNon  of  output  sequences  

•  The  Baum-­‐Welch  algorithm  can  be  used  to  determine  probability  of  a  sequence  based  on  previous  sequences  

•  In  the  context  of  our  research,  packet  lengths  can  be  used  as  the  sequences  

Hidden  Markov  Models  

Page 23: Practical Attacks Against Encrypted VoIP Communications

•  A  variaNon  of  HMM  •  Introduces  Insert  and  Deletes  •  Allows  the  model  to  idenNfy  sequences  with  Inserts  or  Deletes  

•  ParNcularly  relevant  to  analysis  of  audio  codecs  where  idenNcal  uHerances  of  the  same  phrase  by  the  same  speaker  are  unlikely  to  have  idenNcal  paHerns  

Profile  Hidden  Markov  Models  

Page 24: Practical Attacks Against Encrypted VoIP Communications

•  Consider  a  model  trained  to  recognise:  A  B  C  D  

 •  The  model  can  sNll  recognise  paHerns  with  inser&on:  

A  B  X  C  D    

•  Or  paHerns  with  dele&on:  A  B  C  

Profile  Hidden  Markov  Models  

Page 25: Practical Attacks Against Encrypted VoIP Communications

•  Largely  replaced  by  HMMs  •  Measures  similarity  in  sequences  that  vary  in  Nme  or  speed  

•  Commonly  used  in  speech  recogniNon  •  Useful  in  our  research  because  of  the  temporal  element  

•  A  packet  capture  is  essenNally  a  Nme  series  

Dynamic  Time  Warping  

Page 26: Practical Attacks Against Encrypted VoIP Communications

•  Computes  a  ‘distance’  between  two  Nme  series  –  DTW  distance  

•  Different  to  Euclidean  distance  

•  The  DTW  distance  can  be  used  as  a  metric  for  ‘closeness’  between  the  two  Nme  series  

Dynamic  Time  Warping  

Page 27: Practical Attacks Against Encrypted VoIP Communications

Dynamic  Time  Warping  -­‐  Example  •  Consider  the  following  sequences:  

–  0  0  0  4  7  14  26  23  8  3  2  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    –  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  5  6  13  25  24  9  4  2  0  0  0  0  0    

•  IniNal  analysis  suggests  they  are  very  different,  if  comparing  from  the  entry  points.  

•  However  there  are  some  similar  characterisNcs:  –  Similar  shape  –  Peaks  at  around  25  –  Could  represent  the  same  sequence,  but  at  different  Nme  

offsets?  

0  

5  

10  

15  

20  

25  

30  

1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  

Series1  

Series3  

Page 28: Practical Attacks Against Encrypted VoIP Communications

Side  Channel  A)acks  

Page 29: Practical Attacks Against Encrypted VoIP Communications

•  Usually  connecNons  are  peer-­‐to-­‐peer  

•  We  assume  that  encrypted  VoIP  traffic  can  be  captured:  –  Man-­‐in-­‐the-­‐middle  –  Passive  monitoring  

•  Not  beyond  the  realms  of  possibility:  –  “GCHQ  taps  fibre-­‐opNc  cables”  hHp://www.theguardian.com/uk/2013/jun/21/gchq-­‐cables-­‐secret-­‐world-­‐communicaNons-­‐nsa  

–  “China  hijacked  Internet  traffic”hHp://www.zdnet.com/china-­‐hijacked-­‐uk-­‐internet-­‐traffic-­‐says-­‐mcafee-­‐3040090910/  

Side  Channel  A)acks  

Page 30: Practical Attacks Against Encrypted VoIP Communications

•  But  what  can  we  get  from  just  a  packet  capture?  

Side  Channel  A)acks  

Page 31: Practical Attacks Against Encrypted VoIP Communications

•  Source  and  DesNnaNon  endpoints  – Educated  guess  at  language  being  spoken  

•  Packet  lengths  

•  Timestamps  

Side  Channel  A)acks  

Page 32: Practical Attacks Against Encrypted VoIP Communications

•  So  what?......  

•  We  now  know  VBR  codecs  encode  different  sounds  at  variable  bit  rates  

•  We  now  know  some  VoIP  implementaNons  use  a  length  preserving  cipher  to  encrypt  voice  data  

Side  Channel  A)acks  

Page 33: Practical Attacks Against Encrypted VoIP Communications

 Variable  Bit  Rate  Codec                                            +  Length  Preserving  Cipher  =  

Side  Channel  A)acks  

Page 34: Practical Attacks Against Encrypted VoIP Communications

Case  Study  

Page 35: Practical Attacks Against Encrypted VoIP Communications

•  ConnecNons  are  peer-­‐to-­‐peer  •  Uses  the  Opus  codec  (RFC  6716):  

“Opus  is  more  efficient  when  opera@ng  with  variable  bitrate  (VBR)  which  is  the  default”  

•  Skype  uses  AES  encrypNon  in  integer  counter  mode  

•  The  resulNng  packets  are  not  padded  up  to  size  boundaries  

Skype  Case  Study  

Page 36: Practical Attacks Against Encrypted VoIP Communications

Skype  Case  Study  

Page 37: Practical Attacks Against Encrypted VoIP Communications

•  Although  similar  phrases  will  produce  a  similar  paHern,  they  won’t  be  idenNcal:  – Background  noise  – Accents  – Speed  at  which  they’re  spoken  

•  Simple  substring  matching  won’t  work!  

Skype  Case  Study  

Page 38: Practical Attacks Against Encrypted VoIP Communications

•  The  two  approaches  we  chose  make  use  of  the  NLP  techniques:  – Profile  Hidden  Markov  Models  – Dynamic  Time  Warping  

Skype  Case  Study  

Page 39: Practical Attacks Against Encrypted VoIP Communications

•  Both  approaches  are  similar  and  can  be  broken  down  in  the  following  steps:  –  Train  the  model  for  the  target  phrase  –  Capture  the  Skype  traffic  –  “Ask”  the  model  if  it’s  likely  to  contain  the  target  phrase  

Skype  Case  Study  

Page 40: Practical Attacks Against Encrypted VoIP Communications

•  To  “train”  the  model,  a  lot  of  test  data  is  required  

•  We  used  the  TIMIT  Corpus  data  

•  Recordings  of  630  speakers  of  eight  major  dialects  of  American  English  

•  Each  speaker  reads  a  number  of  “phoneNcally  rich”  sentences  

Skype  Case  Study  -­‐  Training  

Page 41: Practical Attacks Against Encrypted VoIP Communications

“Why  do  we  need  bigger  and  beHer  bombs?”      

Skype  Case  Study  -­‐  TIMIT  

Page 42: Practical Attacks Against Encrypted VoIP Communications

“He  ripped  down  the  cellophane  carefully,  and  laid  three  dogs  on  the  Nn  foil.”      

Skype  Case  Study  -­‐  TIMIT  

Page 43: Practical Attacks Against Encrypted VoIP Communications

“That  worm  a  murderer?”      

Skype  Case  Study  -­‐  TIMIT  

Page 44: Practical Attacks Against Encrypted VoIP Communications

•  To  collect  the  data  we  played  each  of  the  phrases  over  a  Skype  session  and  logged  the  packets  using  tcpdump  

for((a=0;a<400;a++)); do /Applications/VLC.app/Contents/MacOS/VLC --no-repeat -I rc --play-and-exit $a.rif ; echo "$a " ; sleep 5 ; done !

Skype  Case  Study  -­‐  Training  

Page 45: Practical Attacks Against Encrypted VoIP Communications

•  PCAP  file  containing  ~400  occurrences  of  the  same  spoken  phrase  

•  “Silence”  must  be  parsed  out  and  removed  

•  Fairly  easy  -­‐  generally,  silence  observed  to  be  less  than  80  bytes  

•  Unknown  spikes  to  ~100  during  silence  phases  

Skype  Case  Study  -­‐  Training  

Page 46: Practical Attacks Against Encrypted VoIP Communications

Skype  Case  Study  -­‐  Silence  

Short  excerpt  of  Skype  traffic  of  the  same  recording  captured  3  Nmes,  each  separated  by  5  seconds  of  silence:  

Page 47: Practical Attacks Against Encrypted VoIP Communications

Approach  to  idenNfy  and  remove  the  silence:  

–  Find  sequences  of  packets  below  the  silence  threshold,  ~80  bytes  

–  Ignore  spikes  when  we’re  in  a  silence  phase  (i.e.  20  conNnuous  packets  below  the  silence  threshold)  

–  Delete  the  silence  phase  –  Insert  a  marker  to  separate  the  speech  phases  –  integer  222,  in  our  case  

–  This  leaves  us  with  just  the  speech  phases…..  

Skype  Case  Study  -­‐  Silence  

Page 48: Practical Attacks Against Encrypted VoIP Communications

Skype  Case  Study  -­‐  Silence  

Page 49: Practical Attacks Against Encrypted VoIP Communications

•  Biojava  provides  a  useful  open  source  framework  –  Classes  for  Profile  HMM  modeling  –  BaumWelch  for  training  –  A  dynamic  matrix  programming  class  (DP)  for  calling  into  Viterbi  for  sequence  analysis  on  the  PHMM  

•  We  chose  this  library  to  implement  our  aHack  

Skype  Case  Study  –  PHMM  A)ack    

Page 50: Practical Attacks Against Encrypted VoIP Communications

•  Train  the  ProfileHMM  object  using  the  Baum  Welch  

•  Query  Viterbi  to  calculate  a  log-­‐odds  

•  Compare  the  log-­‐odds  score  to  a  threshold  

•  If  above  threshold  we  have  a  possible  match  

•  If  not,  the  packet  sequence  was  probably  not  the  target  phrase  

Skype  Case  Study  –  PHMM  A)ack    

Page 51: Practical Attacks Against Encrypted VoIP Communications

•  Same  training  data  as  PHMM  •  Remove  silence  phases  •  Take  a  prototypical  sequence  and  calculate  DTW  distance  of  all  training  data  from  it  

•  Determine  a  typical  distance  threshold  •  Calculate  DTW  distance  for  test  sequence  and  compare  to  threshold  

•  If  the  distance  is  within  the  threshold  then  likely  match  

Skype  Case  Study  –  DTW  A)ack    

Page 52: Practical Attacks Against Encrypted VoIP Communications

PHMM  Demonstra%on  

Page 53: Practical Attacks Against Encrypted VoIP Communications

Skype  Case  Study  –  Pre  Tes%ng        

Page 54: Practical Attacks Against Encrypted VoIP Communications

Skype  Case  Study  –  Post  Tes%ng  Cypher:  “I  don’t  even  see  the  code.  All  I  see  is  blonde,  bruneHe,  red-­‐head”  

Page 55: Practical Attacks Against Encrypted VoIP Communications

•  Recall  rate  of  approximately  80%  

•  False  posiNve  rate  of  approximately  20%  

•  PhoneNcally  richer  phrases  will  yield  lower  false  posiNves  

•  TIMIT  corpus:  “Young  children  should  avoid  exposure  to  contagious  diseases”  

PHMM  Sta%s%cs  

Page 56: Practical Attacks Against Encrypted VoIP Communications

DTW  Results  

•  Similarly  to  PHMM  results,  ~80%  recall  rate  

•  False  posiNve  rate  of  20%  and  under  –  again,  as  long  as  your  training  data  is  good.  

Page 57: Practical Attacks Against Encrypted VoIP Communications

Silent  Circle  -­‐  Results  

•  Not  vulnerable  –  all  data  payload  lengths  are  176  bytes  in  length!  

Page 58: Practical Attacks Against Encrypted VoIP Communications

Wrapping  up  

Page 59: Practical Attacks Against Encrypted VoIP Communications

•  Some  guidance  in  RFC656216    

•  Padding  the  RTP  payload  can  provide  a  reducNon  in  informaNon  leakage  

•  Constant  bitrate  codecs  should  be  negoNated  during  session  iniNaNon  

Preven%on  

Page 60: Practical Attacks Against Encrypted VoIP Communications

•  Assess  other  implementaNons  –  Google  Talk  – Microsoc  Lync  –  Avaya  VoIP  phones  –  Cisco  VoIP  phones  –  Apple  FaceTime  

•  According  to  Wikipedia,  uses  RTP  and  SRTP…Vulnerable?  

•  Improvements  to  the  algorithms  -­‐  Apply  the  Kalman  filter?  

Further  work  

Page 61: Practical Attacks Against Encrypted VoIP Communications

•  Variable  bitrate  codecs  are  unsafe  for  sensiNve  VoIP  transmission  

•  It  is  possible  to  deduce  spoken  conversaNons  in  encrypted  VoIP  

•  VBR  with  length  preserving  encrypted  transports  like  SRTP  should  be  avoided  

•  Constant  bitrate  codecs  should  be  used  where  possible  

Conclusions  

Page 62: Practical Attacks Against Encrypted VoIP Communications

@domchell  @MDSecLabs