Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
NEURAL VISUAL OBJECTS ENHANCEMENT FOR SENSORIAL SUBSTITUTION FROM VISION TO AUDITION
Damien Lescal, Louis-Charles Caron and Jean Rouat NECOTIS, Université de Sherbrooke, GEGI, Sherbrooke QC, Canada, J1K 2R1
Introduction Existing sensorial substitution systems: From vision to touch [1]: From vision to audition [2]:
Studies have shown that simple tasks such as object position [3], shape recognition [4,5] and reading can be achieved using these systems. Object based approach: Different methods exist to find and track objects [6] or to represent structure of objects as graphs [7] but, they need large sets of data and require prior knowledge of the visual environnement.
Overview of the substitution system
1 2
Sound 1
Sound 2
Sound 3
F1
F2
F3
3 4
+ Final sound
5
1. Captured image: each pixel of the image is captured and then represented by one neuron. 2. Enhanced saliencies: strongly c o n n e c te d n e u ro n s fo r m clusters which represent regions of interest in the image. 3.Sound generation: each region is then associated to a simple sound
4. Filtering: every single sound is then filtered depending on its position in the image using Virtual Acoustic Space (VAS) model [11].
5.Complex sound: all simple sounds are added and form a complex sound which represents the visual scene
Each pixel p of the input image is a neuron n in the network and synapses connect each neuron to its 8 neighboring neurons by synapses . Weight of a synapse between two neurons : Where is the weight of the synapse connecting neurons and , f () is a possibly nonlinear function and is neuron ’s pixel value. Computation of an iteration :
The sounds are created in two steps: • Recording the transfer function of the head: HRTFs are measured by placing miniature probe microphones into the subject’s ears and
recording the impulse responses to broad-band sounds presented from a range of directions in space. • Playing back the sounds through a VAS filter: the bank of HRTF impulse responses are now converted into a filter bank. Any desired sound
can be convolved with one of these filters and played over headphones. This creates the perception of an externalized sound source.
Discussion
• How much information from the visual scene can be carried by the auditory pathway?
• Would our system be fast enough to process
images and generate sound in real time?
References [1] P Bach-Y-Rita et al., “Vision substitution by tactile image projection,” Nature, vol. 221, pp. 963–964, 1969, [2] P.B.L. Meijer, “An experimental system for auditory image representations,” IEEE Tr. On Biomedical Engineering, vol. 39, no. 2, pp. 112 –121, 1992, [3] G Jansson, “Tactile guidance of movement,” International Journal of Neuroscience, vol. 19, pp. 37– 46, 1983, [4] E Sampaio et al., “Brain plasticity: ’Visual’ acuity of blind persons via the tongue,” Brain Research, vol. 908, pp. 204–207, 2001, [5] K A Kaczmarek and S J Haase, “Pattern identification… electrotactile display,” IEEE Tr. on N. S. R. E., vol. 11, pp. 9–16, 2003, [6]I.Kokkinosand A.Yuille,“Hop:Hierarchical object parsing,” in Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference 2009, pp. 802–809, [7] Michael I. Jordan (ed), Learning in Graphical Models, MIT Press, 1998, [8] P.M.Milner,“A model for visual shape recognition,” Psychological Review, vol. 81, no. 6, pp. 521–535, Nov. 1974, [9] Christoph Von Der Malsburg, “The correlation theory of brain function,” Models of Neural Networks II: Temporal Aspects … Biological Systems, , no. July 1981, pp. 95–119, 1994, [10] Louis-Charles Caron et al., “FPGA implementation of a spiking neural network for pattern matching,” in IEEE Int. Symp. on Circuits and Systems , Rio de Janeiro, 15–18 May 2011, p. 1342, [11] J. Schnupp et al, Auditory Neuroscience: Making Sense of Sound, The MIT Press, 2011.
n1 n2 n3
n4 n5 n6
n9n8n7
s35
s 57
s45 s56
s25
s58
s15
s59
p1 p2 p3
p4 p5 p6
p7 p8 p9
sij.w = sji.w = f (abs(ni.p− nj.p))sij.w
ninj
ni.pni
Object enhancement algorithm
ni.s(k) =ni.s(k −1)+ nj.s(k −1)× sij.w
j∑
NORM
Where is neuron ’s state at iteration k. NORM equals to 9 in this case. Thresholding : Results :
ni.s(k) ni
ni.s(k)← {ni .s(k )0
otherwiseif ni .s(k )≤THRESH
sij