Upload
moses-singleton
View
212
Download
0
Embed Size (px)
Citation preview
Big ASC Meeting 15-16 April 2010 1
CalibrationRoland Goecke
Trent LewisMichael Wagner
Big ASC Meeting 15-16 April 2010 2
What is Calibration?Calibration is not so much a data collection
process, although that will happen to some extent as well
Rather it is about ensuring that the hardware components and black box setup are all correct
Or at least that the settings have been recorded for subsequent analysis
Occurs before the actual data recording takes place.
Big ASC Meeting 15-16 April 2010 3
What is included?Equipment checking
Is the audio and video capturing software running?Are the lights set up correctly?
“Recording” of environmental settingsWhat is the light level?What is the acoustic background noise level?What are the distances between camera(s) and
microphone(s)?“Recording” of subject calibration sequences
Face turningLip movements
Big ASC Meeting 15-16 April 2010 4
Why is this Data Important?The calibration data is potentially fundamental to everyone
who will use the corpus. To name just a few research areas that will particularly pay
attention to the calibration data: A and AV speech recognition A and AV speaker recognition Biometrics (face recognition, face-voice recognition) Speech Perception/Psycho-Acoustics researchers Speech and Hearing researchers
Big ASC Meeting 15-16 April 2010 5
Hardware and Software RequirementsNormal recording equipment and softwareAn additional light meter would be useful to
measure the ‘global’ level of light in the recording environment
Do we need to do something similar for measuring the acoustic background noise?
Swivel chair to place subject inAssists the capturing of the face/head from different
anglesWe want the subjects to turn with the chair, not just
turning their headsThis is more accurate
Masking tape to mark chair position, angles, etc.Metronome (AV synchro)
Big ASC Meeting 15-16 April 2010 6
Collection Process – Step 12-step processStep 1 – Record environment without subject
At the beginning of each session or, in case of sessions over longer periods of time, once every hour in case the environmental conditions have changed
Audio and video recording of the recording environment without a subject present (30s)
Audio and video recording of the metronome in the scene (30s)
Measurement of location of light sources and distance to camera(s)(manual measurement)
Check camera output is being recordedCheck microphone output is being recordedTime 5min
Big ASC Meeting 15-16 April 2010 7
Collection Process – Step 2Step 2 – Person specific calibration
At the beginning of each recording session with a subject
Sit subject on swivel chair. Measure distances camera(s) to subject and microphone(s) to subject (manual measurement)
Turn subject to 90° left. It is important that the subject turns their entire body on the swivel chair such that the face (nose?) points in the required direction.
We will need both markers on the floor as well as on the walls in 15° intervals to facilitate the correct turning on the subjects.
Turn subjects to every 15° starting from -90° (left profile) to +90° ( right profile), take 2s at each position
Big ASC Meeting 15-16 April 2010 8
Collection Process – Step 2Let subject face camera frontally. Participants are to say the following two lip
movement calibration sequences for 5s each:e o e o e o … (testing lip rounding)ba ba ba … (testing vertical mouth opening)
This is similar to what was done in the AVOZES corpus and turned out to be quite useful in determining some understanding of the range of lip movements a subject makes
Other sequences are possibleTime: 5min
Big ASC Meeting 15-16 April 2010 9
Coding and AnnotationNo coding or annotation required as suchWant to take note of the environmental conditions
in which the recordings take place Light level Can the acoustic base level, i.e. when no one is talking can be
measured from the recorded audio stream, be sufficiently determined from the recordings without a subject? If so, no extra measurements required here.
Distance of camera(s) to subject(s) Distance of microphone(s) to subject(s), e.g. to chin or mouth Location of light sources and distance to camera(s) or subject(s)
(we may need a sketch of the recording environment for each location)