View
52
Download
5
Category
Tags:
Preview:
DESCRIPTION
Privacy Protection for Life-log Video. Jayashri Chaudhari November 27, 2007 Department of Electrical and Computer Engineering University of Kentucky, Lexington, KY 40507. Outline. Motivation and Background Proposed Life-Log System Privacy Protection Methodology Face Detection and Blocking - PowerPoint PPT Presentation
Citation preview
Privacy Protection for Life-log Video
Jayashri Chaudhari
November 27, 2007
Department of Electrical and Computer EngineeringUniversity of Kentucky, Lexington, KY 40507
Outline
Motivation and Background Proposed Life-Log System Privacy Protection Methodology
Face Detection and Blocking Voice Segmentation and Distortion
Experimental Results Segmentation Algorithm Analysis Audio Distortion Analysis
Conclusions
What is a Life-Log System?
Applications include• Law enforcement
• Police Questioning
• Tourism
• Medical Questioning
• Journalism
“A system that records everything, at every moment and everywhere you go”
Existing Systems/work
1) “MyLifeBits Project”: At Microsoft Research
2) “WearCam” Project: At University of Toronto, Steve Mann
3) “Cylon Systems”: http::/cylonsystems.com at UK (a portable body worn surveillance system)
Technical Challenges
Security and Privacy Information management and storage Information Retrieval Knowledge Discovery Human Computer Interface
Technical Challenges
Security and Privacy Information management and storage Information Retrieval Knowledge Discovery Human Computer Interface
Why Privacy Protection?
Privacy is fundamental right of every citizen Emerging technologies threaten privacy right There are no clear and uniform rules and
regulations regarding video recording People are resistant toward technologies like
life-log Without tackling these issues the deployment of
such emerging technologies is impossible
Research Contributions
Practical audio-visual privacy protection scheme for life-log systems
Performance measurement (audio) onPrivacy protectionUsability
Proposed Life-log System
“A system that protects the audiovisual privacy of the persons captured by a portable video recording device”
Privacy Protection Scheme
Design Objectives
• Privacy• Hide the identity of the subjects being captured
• Privacy verses usefulness: • Recording should convey sufficient information to be useful
√ Usefulness× Privacy
× Usefulness√ Privacy
√ Usefulness√ Privacy
Design Objectives Anonymity or Ambiguity
• The scheme should generate ambiguous identity of the recorded subjects.
• Every individual will look and sound identical• Reduce correlation attacks
Speed• Protection scheme should work in real time
Interview Scenario• Producer is speaking with a single subject in relative quiet
room
Privacy Protection Scheme Overview
audio
Audio Segmentation
Audio Segmentation
Audio Distortion
Audio Distortion
Face Detection and
Blocking
Face Detection and
Blocking
videoSynchronization & Multiplexing
Synchronization & Multiplexing
storage
S
P
S: Subject (The person who is being recorded)
P: Producer (The person who is the user of the system)
Voice Segmentation and distortion
Statek=Statek-1 or Subject or Producer
Windowed
Power, Pk
Computation
Windowed
Power, Pk
ComputationPk <TSPk <TS Pk <TP
Pk <TP
Y Y
Statek= Producer
Statek= Subject
Storage
Pitch Shifting
We use the PitchSOLA time-domain pitch shifting method.
* “DAFX: Digital Audio Effects” by Udo Zölzer et al.
Pitch Shifting Algorithm
Pitch Shifting (Synchronous Overlap and Add):
Steps 1) Time Stretching by a factor of α using window of size N and stepsize Sa
Input Audio
N
X1(n)
SaX2(n)
α*Sa
Step 2) Re-sampling by a factor of 1/α to change pitch
X2(n) X2(n)Km
Max correlationReduce discontinuity in phase and pitchMixing
Face Detection and Blockingcamera
FaceDetection
FaceDetection
Face detection is based on Viola & Jones 2001.
FaceTracking
FaceTracking
SubjectSelection
SubjectSelection
SelectiveBlocking
SelectiveBlocking
Audio segmentationresults
Subjecttalking
Producertalking
Initial Experiments1
• Analysis of Segmentation algorithm
• Analysis of Audio distortion algorithm
1) Accuracy in hiding identity
2) Usability after distortion
1: Chaudhari J., S.-C. Cheung, and M. V. Venkatesh. Privacy protection for life-log video. In IEEE Signal Processing Society SAFE 2007: Workshop on Signal Processing Applications for Public Security and Forensics, 2007.
Segmentation ExperimentExperimental Data:
• Interview Scenario in quiet meeting room
• Three interviews recording of about 1 minute and 30 seconds long
Transitions
P S P S P PS Silence
S: Subject Speaking
P: Producer Speaking
Segmentation Results
Meeting# Transition#
(Ground truth)
Correctly identified transitions#
Falsely detected
Transitions#
Precision Recall
1 7 6 10 0.375 0.857
2 7 7 5 0.583 1
3 6 6 10 0.353 1
truthgroundin stransition#
ns transitioidentifiedcorrectly #Recall
ns transitioidentified #
ns transitioidentifiedcorrectly # Precision
Comparison With CMU Segmentation Algorithm
Meeting # Our Algorithm CMU Algorithm
Precision Recall Precision Recall
1 0.375 0.857 0.667 0.57
2 0.583 1 1 0.57
3 0.353 1 0.4 0.5
CMU audio segmentation algorithm1 used as benchmark
1:Matthew A. Seigler, Uday Jain, Bhiksha Raj, and Richard M. Stern. Automatic segmentation, classification and clustering of broadcast news audio. In Proceedings of the Ninth Spoken Language Systems Technology Workshop, Harriman, New York, 1997.
Speaker Identification Experiment
Experimental Data
• 11 Test subjects, 2 voice samples from each subject
• One voice sample is used as training and the other is used for testing
• Public domain speaker recognition software
Script1This script is used for training the speaker recognition software
Train
TestScript2This script is used to test the performance of audio distortion in hiding the identity
Speaker Identification Results
Person ID
Without Distortion
(Person ID identified)
Distortion 1
(Person ID identified)
Distortion 2
(Person ID identified)
Distortion 3
(Person ID identified)
1 1 5 8 5
2 2 6 8 6
3 3 5 3 5
4 4 6 6 5
5 5 3 10 6
6 6 8 6 5
7 7 5 2 5
8 8 10 11 5
9 9 5 8 5
10 10 5 2 5
11 11 4 8 5
Error Rate
0% 100% 90.9% 100%
Distortion 1: (N=2048, Sa=256, α =1.5) Distortion 2: (N=2048, Sa=300, α =1.1)
Distortion 3: (N=1024, Sa=128, α =1.5)
Usability Experiments
Experimental Data
• 8 subjects, 2 voice samples from each subject
• One voice sample is used without distortion and the other is distorted
• Manual transcription (5 human tester)
1.Wav (transcription1)1.Wav (transcription1)This transcription is of undistorted This transcription is of undistorted voice --- stored in one dot wav file.voice --- stored in one dot wav file.
2.Wav (transcription2)2.Wav (transcription2)This transcription is of distorted voice This transcription is of distorted voice sample --- in two dot wav ---.sample --- in two dot wav ---.
Manual Transcription
Unrecognized words
Usability after distortion
Word Error Rate: Standard measure of word recognition error for speech recognition system
WER= (S+D+I) /N
S = # substitution
D = # deletion
I = # insertion
N = # words in reference sample
Tool used: NIST tool SCLITE
Extended Experiments
Data set TIMIT (Texas Instruments and Massachusetts Institute of
Technology) Speech Corpora
Experimental Setup Allowable range of alpha (α): 0.2-2.0 Five alpha values (α=0.5,0.75,1,1.25,1.40) Increase the scope of experiments
• “Subjective Experiments”: Use testers to access privacy and usability
Privacy Experiments (Speaker Identification)
• Total 30 audio clips in each set
• Re-divide the audio clips from each sets into five groups (1-5)
• Each group consists of 6 audio clips randomly selected from each set
• Each group was assigned to three testers and were asked to do 3 tasks
TIMIT Corpora
(630 speakers, 10 audio clips per speaker)
Our Experiments
(30 speakers, 5 audio clips per speaker)
Set A
(α=1)
Set B
(α=0.5)
Set C
(α=0.75)
Set E
(α=1.40)Set D
(α=1.25)
Experimental Setup
Task 1: Transcribe audio clips in the assigned group.
Purpose: Determine usability of the recording after distortion
Results:Metric: WER for each transcription by the
testerAverage WER for each clip from 3 testers
WER for Speaker with the given alpha(α) value
Subjective Experiments
(Effect of distortion on WER) Average WER for Set A,B,C,D,E
0
20
40
60
80
100
120
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29Person ID(1-30)
Aver
age
WER
Per
cent
age
set A
set B
set C
set D
set E
0
10
20
30
40
50
60
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
set A
set C
set D
set E
Average WER per speaker for each alpha value
(0-30)
(0-60)
(0-35)
(0-35)
Average WER per Set
Avg WER for each set
0
20
40
60
80
100
120
1
Avg
WE
R
A B C D E
14.2
100
22.4 15.3 14.4
Sets
Statistical Analysis Z-test calculations
Null Hypothesis: The average WER does not change (from Set A (before distortion) ) after the distortion for a given value of pitch scaling parameter (alpha)
H0: p1 = p2 (Null Hypothesis) Ha: p1 != p2
Parameters Value
Population Size 12*30=360
α 0.05
Confidence Level 95%
Z-Test critical
( |Zα/2| )
1.96
Rule for Rejection of H0
Z>=Zα/2 or
Z<=-Zα/2
Comparison Statistics
Set A and B (0.50) 46.71>=1.96
Set A and C (0.75) 2.873>=1.96
Set A and D (1.25) 0.419<=1.96
Set A and E (1.40) 0.0695<=1.96
Z-Test parameters Z-Test Results
Subjective Experiments
Group Average # of distinct voices per subset
(Each subset consist of 6 audio clips)
Subset of
A
Subset of
B
Subset of
C
Subset of
D
Subset of
E
1 6.0 3.33 4.33 4.0 3.33
2 6.0 3.0 3.33 4.0 4.0
3 6.0 2.0 4.0 3.0 4.0
4 6.0 2.67 4.0 3.67 2.67
5 6.0 3.0 3.0 3.67 4.0
Average Number of Distinct voices
6.0 2.75 3.92 3.67 3.50
Task 2: Identify the number of distinct voices in each subset in the assigned group.
Purpose: Estimate ambiguity created by pitch shifting
Results:
Subjective Experiments
Task 3: For each clip from subset of Set A (which is the original un-distorted speech set); identify a clip in other subsets in which the same speaker may be speaking
Purpose: Qualitatively measure the assurance of Privacy Protection achieved by distortion
Results: None of the speakers from set A was identified from other distorted sets. (100% Recognition Error Rate)
Privacy Experiments
Speaker Identification Experiments
ASR tools (LIA_Spk-Det and ALIZE)1 by LIA lab at the University of Avignon Speaker Verification Tool GMM-UBM (Gaussian Mixture Model-Universal
Background Model)• Single Speaker Independent Background Model
• Decision: Likelihood Ratio:
1: Bonastre, J.-F., Wild, F., Alize: a free, open tool for speaker recognition, http://www.lia.univ-avignon.fr/heberges/ALIZE/
0
1
( | )
( | )
p Y H
p Y H
LIA_RAL Speaker-Det
WarpingTrainingInitialization
World Modeling
Bayesian Adaptation (MAP)
Target Speaker Modeling
32 coefficients = 16 LFCC + 16 derivative coefficients
(SPRO4)
2 GMM (2048 components)
1: Male 2:Female
Feature Extraction
(SPRO Tool)
Silence Frame Removal
(EnergyDetector)
Parameter Normalization
(NormFeat)
Front Processing
Adapts a World Model
(TrainWorld)(TrainTarget)
Speaker Detection
(ComputeTest)
( | )( | ) log
( | )
l s TLLR s T
l s W
Feature Vectors
Experimental Setup World Model
Number of male speakers = 325 Number of female speakers = 135
Target Speaker Model Number of male test clips = 20 Number of female test clips = 10
Two sets of experiments Same Model:
• World Model and Individual Speaker Models: (Training Set: distorted speech with the corresponding alpha)
Cross Model: • World Model and Individual Speaker Models: (Training Set: un-
distorted speech)
Privacy Results
Alpha Sex Same Model Cross Model
Set A M 1.0 1.0
Set A F 4.4 4.4
Set B M 2.5 150.75
Set B F 1.7 57.80
Set C M 8.65 170.90
Set C F 5.4 46.40
Set D M - 185.75
Set D F 20.30 67.80
Set E M 52.05 157.45
Set E F 29.20 79.80
Conclusions
• Cross Model: Distorted speech, no matter what alpha value is used, is very different from the original speech.
• Same Model: Set B and Set C do not provide adequate protection as the rank is still very near the top.
• Numbers in table is Average rank for the true speakers of the test clips for the corresponding alpha value
Example Video
Conclusions
Proposed Real time implementation of voice-distortion and face blocking for privacy protection in Life-log video
Analysis of Audio Segmentation Analysis of Audio Distortion for usability Analysis of Audio Distortion for privacy protection
Acknowledgment
• Prof. Samson Cheung• People at Center of Visualization and
Virtual Environment• Prof. Donohue and Prof. Zhang
Thank you!
Voice Distortion
Voice Identity Vocal Track (Formats) : Filters Vocal Chord (Pitch): Excitation Source
Different ways to distort audio: Random mixture
• Makes the recording useless Voice Transformation
• For example, • More Complex, not suitable for real-time applications
Pitch-shifting • Changes the pitch of voice• Keeps the recording useful
PitchSOLA time-domain pitch shifting method. * “DAFX: Digital Audio Effects” by U. Z. et al. Simple with less complexity
• Cross Model:
• World Model and Individual Speaker Models: (Training Set: un-distorted speech)
• Same Model
• World Model and Individual Speaker Models: (Training Set: distorted speech)
Recommended