Upload
iphonepentest
View
1.723
Download
1
Embed Size (px)
DESCRIPTION
The slides from MDSec's presentation at HackInTheBox KUL 2013. The presentation describes attacks that can be used to deduce spoken conversations from encrypted VoIP communications. The presentation uses Skype as a case study.
Citation preview
Prac%cal A)acks Against Encrypted VoIP Communica%ons
HITBSECCONF2013: Malaysia Shaun Colley & Dominic Chell
@domchell @mdseclabs
Agenda
• This is a talk about traffic analysis and paHern matching
• VoIP background • NLP techniques • StaNsNcal modeling • Case studies aka “the cool stuff”
• VoIP is a popular replacement for tradiNonal copper-‐wire telephone systems
• Bandwidth efficient and low cost • Privacy has become an increasing concern • Generally accepted that encrypNon should be used for end-‐to-‐end security
• But even if it’s encrypted, is it secure?
Introduc%on
Why?
• Widespread accusaNons of wiretapping • Leaked documents allegedly claim NSA & GCHQ have some “capability” against encrypted VoIP
• “The fact that GCHQ or a 2nd Party partner has a capability against a specific the encrypted used in a class or type of network communica@ons technology. For example, VPNs, IPSec, TLS/SSL, HTTPS, SSH, encrypted chat, encrypted VoIP”.
• LiHle work has been done by the security community
• Some interesNng academic research – Uncovering Spoken Phrases in Encrypted Voice over IP CommunicaNons: Wright, Ballard, Coull, Monrose, Masson
– Uncovering Spoken Phrases in Encrypted VoIP ConversaNons: Doychev, Feld, Eckhardt, Neumann
• Not widely publicised • No proof of concepts
Previous Work
Background: VoIP
• Similar to tradiNonal digital telephony, VoIP involves signalling, session iniNalisaNon and setup as well as encoding of the voice signal
• Separated in to two channels that perform these acNons: – Control channel – Data channel
VoIP Communica%ons
• Operates at the applicaNon-‐layer • Handles call setup, terminaNon and other essenNal aspects of the call
• Uses a signalling protocol such as: – Session IniNaNon Protocol (SIP) – Extensible Messaging and Presence Protocol (XMPP)
– H.323 – Skype
Control Channel
Control Channel
• Handles sensiNve call data such as source and desNnaNon endpoints, and can be used for modifying exisNng calls
• Typically protected with encrypNon, for example SIPS which adds TLS
• Ocen used to establish the the direct data connecNon for the voice traffic in the data channel
• The primary focus of our research • Used to transmit encoded and compressed voice data
• Typically over UDP • Voice data is transported using a transport protocol such as RTP
Data Channels
• Commonplace for VoIP implementaNons to encrypt the data flow for confidenNality
• A common implementaNon is Secure Real-‐Time Transport Protocol (SRTP)
• By default will preserve the original RTP payload size
• “None of the pre-‐defined encryp@on transforms uses any padding; for these, the RTP and SRTP payload sizes match exactly.”
Data Channels
Background: Codecs
• Used to convert the analogue voice signal in to a digitally encoded and compressed representaNon
• Codecs strike a balance between bandwidth limitaNons and voice quality
• We’re mostly interested in Variable Bit Rate (VBR) codecs
Codecs
• The codec can dynamically modify the bitrate of the transmiHed stream
• Codecs like Speex will encode sounds at different bitrates
• For example, fricaNves may be encoded at lower bitrates than vowels
Variable Bitrate Codecs
• The primary benefit from VBR is a significantly beHer quality-‐to-‐bandwidth raNo compared to CBR
• Desirable in low bandwidth environments – Cellular – Slow WiFi
Variable Bitrate Codecs
Background: NLP and Sta%s%cal Analysis
• Research techniques borrowed from NLP and bioinformaNcs
• Primarily the use of: – Profile Hidden Markov Models – Dynamic Time Warping
Natural Language Processing
• StaNsNcal model that assigns probabiliNes to sequences of symbols
• TransiNons from Begin state (B) to End state (E)
• Moves from state to state randomly but in line with transiNon distribuNons
• TransiNons occur independently of any previous choices
Hidden Markov Models
• The model will conNnue to move between states and output symbols unNl the End state is reached
• The emiHed symbols consNtute the sequence
Hidden Markov Models
Image from hHp://isabel-‐drost.de/hadoop/slides/HMM.pdf
• A number of possible state paths from B to E • Best path is the most likely path • The Viterbi algorithm can be used to discover the most probable path
• Viterbi, Forward and Backward algorithms can all be used to determine probability that a model produced an output sequence
Hidden Markov Models
• The model can be “trained” by a collecNon of output sequences
• The Baum-‐Welch algorithm can be used to determine probability of a sequence based on previous sequences
• In the context of our research, packet lengths can be used as the sequences
Hidden Markov Models
• A variaNon of HMM • Introduces Insert and Deletes • Allows the model to idenNfy sequences with Inserts or Deletes
• ParNcularly relevant to analysis of audio codecs where idenNcal uHerances of the same phrase by the same speaker are unlikely to have idenNcal paHerns
Profile Hidden Markov Models
• Consider a model trained to recognise: A B C D
• The model can sNll recognise paHerns with inser&on:
A B X C D
• Or paHerns with dele&on: A B C
Profile Hidden Markov Models
• Largely replaced by HMMs • Measures similarity in sequences that vary in Nme or speed
• Commonly used in speech recogniNon • Useful in our research because of the temporal element
• A packet capture is essenNally a Nme series
Dynamic Time Warping
• Computes a ‘distance’ between two Nme series – DTW distance
• Different to Euclidean distance
• The DTW distance can be used as a metric for ‘closeness’ between the two Nme series
Dynamic Time Warping
Dynamic Time Warping -‐ Example • Consider the following sequences:
– 0 0 0 4 7 14 26 23 8 3 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 – 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 6 13 25 24 9 4 2 0 0 0 0 0
• IniNal analysis suggests they are very different, if comparing from the entry points.
• However there are some similar characterisNcs: – Similar shape – Peaks at around 25 – Could represent the same sequence, but at different Nme
offsets?
0
5
10
15
20
25
30
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30
Series1
Series3
Side Channel A)acks
• Usually connecNons are peer-‐to-‐peer
• We assume that encrypted VoIP traffic can be captured: – Man-‐in-‐the-‐middle – Passive monitoring
• Not beyond the realms of possibility: – “GCHQ taps fibre-‐opNc cables” hHp://www.theguardian.com/uk/2013/jun/21/gchq-‐cables-‐secret-‐world-‐communicaNons-‐nsa
– “China hijacked Internet traffic”hHp://www.zdnet.com/china-‐hijacked-‐uk-‐internet-‐traffic-‐says-‐mcafee-‐3040090910/
Side Channel A)acks
• But what can we get from just a packet capture?
Side Channel A)acks
• Source and DesNnaNon endpoints – Educated guess at language being spoken
• Packet lengths
• Timestamps
Side Channel A)acks
• So what?......
• We now know VBR codecs encode different sounds at variable bit rates
• We now know some VoIP implementaNons use a length preserving cipher to encrypt voice data
Side Channel A)acks
Variable Bit Rate Codec + Length Preserving Cipher =
Side Channel A)acks
Case Study
• ConnecNons are peer-‐to-‐peer • Uses the Opus codec (RFC 6716):
“Opus is more efficient when opera@ng with variable bitrate (VBR) which is the default”
• Skype uses AES encrypNon in integer counter mode
• The resulNng packets are not padded up to size boundaries
Skype Case Study
Skype Case Study
• Although similar phrases will produce a similar paHern, they won’t be idenNcal: – Background noise – Accents – Speed at which they’re spoken
• Simple substring matching won’t work!
Skype Case Study
• The two approaches we chose make use of the NLP techniques: – Profile Hidden Markov Models – Dynamic Time Warping
Skype Case Study
• Both approaches are similar and can be broken down in the following steps: – Train the model for the target phrase – Capture the Skype traffic – “Ask” the model if it’s likely to contain the target phrase
Skype Case Study
• To “train” the model, a lot of test data is required
• We used the TIMIT Corpus data
• Recordings of 630 speakers of eight major dialects of American English
• Each speaker reads a number of “phoneNcally rich” sentences
Skype Case Study -‐ Training
“Why do we need bigger and beHer bombs?”
Skype Case Study -‐ TIMIT
“He ripped down the cellophane carefully, and laid three dogs on the Nn foil.”
Skype Case Study -‐ TIMIT
“That worm a murderer?”
Skype Case Study -‐ TIMIT
• To collect the data we played each of the phrases over a Skype session and logged the packets using tcpdump
for((a=0;a<400;a++)); do /Applications/VLC.app/Contents/MacOS/VLC --no-repeat -I rc --play-and-exit $a.rif ; echo "$a " ; sleep 5 ; done !
Skype Case Study -‐ Training
• PCAP file containing ~400 occurrences of the same spoken phrase
• “Silence” must be parsed out and removed
• Fairly easy -‐ generally, silence observed to be less than 80 bytes
• Unknown spikes to ~100 during silence phases
Skype Case Study -‐ Training
Skype Case Study -‐ Silence
Short excerpt of Skype traffic of the same recording captured 3 Nmes, each separated by 5 seconds of silence:
Approach to idenNfy and remove the silence:
– Find sequences of packets below the silence threshold, ~80 bytes
– Ignore spikes when we’re in a silence phase (i.e. 20 conNnuous packets below the silence threshold)
– Delete the silence phase – Insert a marker to separate the speech phases – integer 222, in our case
– This leaves us with just the speech phases…..
Skype Case Study -‐ Silence
Skype Case Study -‐ Silence
• Biojava provides a useful open source framework – Classes for Profile HMM modeling – BaumWelch for training – A dynamic matrix programming class (DP) for calling into Viterbi for sequence analysis on the PHMM
• We chose this library to implement our aHack
Skype Case Study – PHMM A)ack
• Train the ProfileHMM object using the Baum Welch
• Query Viterbi to calculate a log-‐odds
• Compare the log-‐odds score to a threshold
• If above threshold we have a possible match
• If not, the packet sequence was probably not the target phrase
Skype Case Study – PHMM A)ack
• Same training data as PHMM • Remove silence phases • Take a prototypical sequence and calculate DTW distance of all training data from it
• Determine a typical distance threshold • Calculate DTW distance for test sequence and compare to threshold
• If the distance is within the threshold then likely match
Skype Case Study – DTW A)ack
PHMM Demonstra%on
Skype Case Study – Pre Tes%ng
Skype Case Study – Post Tes%ng Cypher: “I don’t even see the code. All I see is blonde, bruneHe, red-‐head”
• Recall rate of approximately 80%
• False posiNve rate of approximately 20%
• PhoneNcally richer phrases will yield lower false posiNves
• TIMIT corpus: “Young children should avoid exposure to contagious diseases”
PHMM Sta%s%cs
DTW Results
• Similarly to PHMM results, ~80% recall rate
• False posiNve rate of 20% and under – again, as long as your training data is good.
Silent Circle -‐ Results
• Not vulnerable – all data payload lengths are 176 bytes in length!
Wrapping up
• Some guidance in RFC656216
• Padding the RTP payload can provide a reducNon in informaNon leakage
• Constant bitrate codecs should be negoNated during session iniNaNon
Preven%on
• Assess other implementaNons – Google Talk – Microsoc Lync – Avaya VoIP phones – Cisco VoIP phones – Apple FaceTime
• According to Wikipedia, uses RTP and SRTP…Vulnerable?
• Improvements to the algorithms -‐ Apply the Kalman filter?
Further work
• Variable bitrate codecs are unsafe for sensiNve VoIP transmission
• It is possible to deduce spoken conversaNons in encrypted VoIP
• VBR with length preserving encrypted transports like SRTP should be avoided
• Constant bitrate codecs should be used where possible
Conclusions
@domchell @MDSecLabs