25
GYROPHONE RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS Yan Michalevsky (1) , Gabi Nakibly (2) and Dan Boneh (1) (1) Stanford University, (2) National Research & Simulation Center, Rafael Ltd. Presented at Usenix Security '14 and Black Hat Europe '14 0

RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

GYROPHONERECOGNIZING SPEECH FROM GYROSCOPE

SIGNALSYan Michalevsky (1), Gabi Nakibly (2) and Dan Boneh (1)

(1) Stanford University, (2) National Research & Simulation Center, Rafael Ltd.

Presented at Usenix Security '14 and Black Hat Europe '140

Page 2: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

ABOUT MESecurity research leader

National Research & Simulation Center

part of Rafael – Advanced Defense Systems Ltd.

High-end research and consulting services

Adjunct researcher and lecturer

The Technion (Israel Institute of Technology)

Page 3: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING
Page 4: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

MICROPHONE ACCESS

REQUIRES PERMISSIONS

Page 5: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

GYROSCOPE ACCESS

DOES NOT REQUIRE PERMISSIONS

Page 6: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

GYROSCOPE ACCESS BY A BROWSERJAVASCRIPT CODE EXAMPLE

if(window.DeviceOrientationEvent) {    window.addEventListener('deviceorientation',      function(event) {        var Xrotation = event.alpha;        var Yrotation = event.beta;        var Zrotation = event.gamma;    });}

ANY WEBSITE CAN ACCESS THE PHONE'S GYRO

Page 7: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

MEMS GYROSCOPESMajor vendors:

STMicroelectronics (Galaxy, iPhones up to 5s)

InvenSense (Google Nexus, iPhone 6)

Page 8: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

GYROSCOPES ARE SUSCEPTIBLE TO SOUND

50 HZ TONE POWER SPECTRAL DENSITY

Page 9: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

GYROSCOPES ARE SUSCEPTIBLE TO SOUND

50-100 HZ CHIRP POWER SPECTRAL DENSITY(MEASURED USING JAVASCRIPT)

Page 10: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

GYROSCOPES ARE (LOUSY, BUT STILL)MICROPHONES

Feature Microphone Gyroscope

Sample Width 16 bits 16 bits

Self Noise ~20dB ~70dB

Sample Freq. 44.1KHz 200Hz

Page 11: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

THIS SAMPLING FREQUENCY IS VERY LOW

SPEECH RANGE (FUNDAMENTAL FREQ.)Adult male 85 - 180 Hz

Adult female 165 - 255 Hz

Page 12: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

THE EFFECTS OF LOW SAMPLING FREQUENCYSPEECH SAMPLED AT 8,000HZ

0:00

SPEECH SAMPLED AT 200HZ

0:00

Page 13: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

EXPERIMENTAL SETUPRoom. Simple speakers.

Smartphone.

Subset of TIDigits speech

recognition corpus

10 speakers X 11 samples X 2

pronunciations = 220 total

samples

Page 14: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

SPEECH ANALYSIS USING A SINGLEGYROSCOPE

Gender identification

Speaker identification

Isolated word recognition

Page 15: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

RECOGNITION ALGORITHMSSTANDARD FEATURES

Mel-Frequency Cepstral

Coefficients

Spectral centroid

RMS energy

Short-Time Fourier Transform

STANDARD CLASSIFIERSMulti-class SVM

Gaussian Mixture

Model

Dynamic Time Warping

Page 16: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

WE CAN SUCCESSFULLY IDENTIFY GENDER

NEXUS 4 84% (DTW)

GALAXY S III 82% (SVM)

Random guess probability is 50%

Page 17: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

A GOOD CHANCE TO IDENTIFY THE SPEAKER

Mixed Female/Male 50% (DTW)

Female speakers 45% (DTW)

Male speakers 65% (DTW)Nex

us 4

Random guess probability is 20% for one gender and 10% for amixed set

Page 18: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

ISOLATED WORDS RECOGNITIONSPEAKER INDEPENDENTMixed Female/Male 17% (DTW)

Female speakers 26% (DTW)

Male speakers 23% (DTW)Nex

us 4

Random guess probability is 9%

Page 19: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

ISOLATED WORDS RECOGNITIONSPEAKER DEPENDENT

Confusion matrix corresponds to the DTW results

Random guess probability is 9%

Page 20: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

HOW CAN WE LEVERAGE EAVESDROPPINGSIMULTANEOUSLY ON TWO DEVICES?

Page 21: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

(Tested for the case of speaker dependent word recognition)

EVALUATION

Single device Two devices

Exhibits improvement over using a single device

Using even more devices might yield even better results

Not a proper non-uniform reconstruction

Page 22: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

DEFENSES

Page 23: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

DEFENSES

Restrict the sampling frequency even further

Access to high sampling rate should require a special

permission

Low-pass filter the raw samples

Acoustic masking

(won't help against vibration of the surface)

Page 24: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

CONCLUSIONThe are many sensors in modern smartphone devices

They can be exploited in various unexpected ways

Giving applications direct access to hardware is dangerous

Page 25: RECOGNIZING SPEECH FROM GYROSCOPE SIGNALS€¦ · Subset of TIDigits speech recognition corpus 10 speakers X 11 samples X 2 pronunciations = 220 total samples. SPEECH ANALYSIS USING

THANK YOU VERY MUCH

QUESTIONS?CRYPTO.STANFORD.EDU/GYROPHONE