16
Applied Soft Computing 62 (2017) 57–72 Contents lists available at ScienceDirect Applied Soft Computing j ourna l h o mepage: www.elsevier.com/locate/asoc A dual fast and slow feature interaction in biologically inspired visual recognition of human action Bardia Yousefi , Chu Kiong Loo Department of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia a r t i c l e i n f o Article history: Received 8 April 2015 Received in revised form 3 September 2017 Accepted 11 October 2017 Keywords: Biologically inspired model Interaction between ventral and dorsal stream Human action recognition Incremental slow feature analysis Extreme learning machine Dual processing pathways a b s t r a c t Computational neuroscience studies have examined the human visual system through functional mag- netic resonance imaging (fMRI) and identified a model where the mammalian brain pursues two independent pathways for recognizing biological movement tasks. On the one hand, the dorsal stream analyzes the motion information by applying optical flow, which considers the fast features. On the other hand, the ventral stream analyzes the form information with slow features. The proposed approach suggests that the motion perception of the human visual system comprises fast and slow feature inter- actions to identify biological movements. The form features in the visual system follow the application of the active basis model (ABM) with incremental slow feature analysis (IncSFA). Episodic observation is required to extract the slowest features, whereas the fast features update the processing of motion information in every frame. Applying IncSFA provides an opportunity to abstract human actions and use action prototypes. However, the fast features are obtained from the optical flow division, which gives an opportunity to interact with the system as the final recognition is performed through a combination of the optical flow and ABM-IncSFA information and through the application of kernel extreme learning machine. Applying IncSFA into the ventral stream and involving slow and fast features in the recognition mechanism are the major contributions of this research. The two human action datasets for benchmark- ing (KTH and Weizmann) and the results highlight the promising performance of this approach in model modification. © 2017 Elsevier B.V. All rights reserved. 1. Introduction A biologically inspired mechanism for human action recognition provides a new horizon in the fields of computer vision and video processing. Neurophysiological, physiological, and psychophysical evidence suggests two independent pathways in the system con- figuration where motion and form information are provided. The slowness principle provides a more stable form information than the current approaches [1–6] because of the independent behav- ior of temporal variations. In the motion pathway (dorsal stream), the motion information (dynamic pattern) is analyzed locally or globally [7–12] and is extracted by an optical flow that creates fast features. In the form pathway (ventral stream), the active basis model (ABM) [13] based incremental slow feature analysis (IncSFA) [14] generates slow features corresponding to the form of move- ments. The interaction between these two pathways defines the combination of information for making a recognition decision. Bio- Corresponding author. E-mail address: [email protected] (B. Yousefi). logically inspired models have been analyzed in many research fields, such as artificial intelligence (e.g., early detection of the action [15] or fusion of homogeneous convolutional neural network classifiers [16]), soft computing (e.g., fuzzy shortest path problems [17]), computer vision (e.g., human action recognition system with projection-based metacognitive learning classifier [18]), and com- putational neuroscience, all of which follow the biological evidence. This research also follows the same evidence to develop a model for recognizing biological movements. The number of temporal illusions, such as time perception distortions [19], bind synchronization of object features [20], and perception of time in object motion [21], has substantially increased. These illusions and the functional organization of visual systems are generally understood through temporal limits [22]. Multiple temporal resolution is caused by the temporal interval within video frames that can be considered for the perception of biological movements. Biological movements are recognized despite the variability in the conditions of lighting, human object locations, or even perspectives in time. Such feature represents an ability for the system [23] and a demand for devising a robust technique against the fast-varying parameters. In the primary sen- https://doi.org/10.1016/j.asoc.2017.10.021 1568-4946/© 2017 Elsevier B.V. All rights reserved.

Applied Soft Computing - · PDF filedual fast and slow feature interaction in biologically inspired visual ... In the motion pathway (dorsal stream), the motion information (dynamic

Embed Size (px)

Citation preview

Ar

BD

a

ARRA

KBIsHIED

1

ppefistitgfm[mc

h1

Applied Soft Computing 62 (2017) 57–72

Contents lists available at ScienceDirect

Applied Soft Computing

j ourna l h o mepage: www.elsev ier .com/ locate /asoc

dual fast and slow feature interaction in biologically inspired visualecognition of human action

ardia Yousefi ∗, Chu Kiong Looepartment of Artificial Intelligence, Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia

r t i c l e i n f o

rticle history:eceived 8 April 2015eceived in revised form 3 September 2017ccepted 11 October 2017

eywords:iologically inspired model

nteraction between ventral and dorsaltreamuman action recognition

ncremental slow feature analysisxtreme learning machineual processing pathways

a b s t r a c t

Computational neuroscience studies have examined the human visual system through functional mag-netic resonance imaging (fMRI) and identified a model where the mammalian brain pursues twoindependent pathways for recognizing biological movement tasks. On the one hand, the dorsal streamanalyzes the motion information by applying optical flow, which considers the fast features. On the otherhand, the ventral stream analyzes the form information with slow features. The proposed approachsuggests that the motion perception of the human visual system comprises fast and slow feature inter-actions to identify biological movements. The form features in the visual system follow the applicationof the active basis model (ABM) with incremental slow feature analysis (IncSFA). Episodic observationis required to extract the slowest features, whereas the fast features update the processing of motioninformation in every frame. Applying IncSFA provides an opportunity to abstract human actions and useaction prototypes. However, the fast features are obtained from the optical flow division, which givesan opportunity to interact with the system as the final recognition is performed through a combination

of the optical flow and ABM-IncSFA information and through the application of kernel extreme learningmachine. Applying IncSFA into the ventral stream and involving slow and fast features in the recognitionmechanism are the major contributions of this research. The two human action datasets for benchmark-ing (KTH and Weizmann) and the results highlight the promising performance of this approach in modelmodification.

© 2017 Elsevier B.V. All rights reserved.

. Introduction

A biologically inspired mechanism for human action recognitionrovides a new horizon in the fields of computer vision and videorocessing. Neurophysiological, physiological, and psychophysicalvidence suggests two independent pathways in the system con-guration where motion and form information are provided. Thelowness principle provides a more stable form information thanhe current approaches [1–6] because of the independent behav-or of temporal variations. In the motion pathway (dorsal stream),he motion information (dynamic pattern) is analyzed locally orlobally [7–12] and is extracted by an optical flow that createsast features. In the form pathway (ventral stream), the active basis

odel (ABM) [13] based incremental slow feature analysis (IncSFA)

14] generates slow features corresponding to the form of move-

ents. The interaction between these two pathways defines theombination of information for making a recognition decision. Bio-

∗ Corresponding author.E-mail address: [email protected] (B. Yousefi).

ttps://doi.org/10.1016/j.asoc.2017.10.021568-4946/© 2017 Elsevier B.V. All rights reserved.

logically inspired models have been analyzed in many researchfields, such as artificial intelligence (e.g., early detection of theaction [15] or fusion of homogeneous convolutional neural networkclassifiers [16]), soft computing (e.g., fuzzy shortest path problems[17]), computer vision (e.g., human action recognition system withprojection-based metacognitive learning classifier [18]), and com-putational neuroscience, all of which follow the biological evidence.This research also follows the same evidence to develop a modelfor recognizing biological movements.

The number of temporal illusions, such as time perceptiondistortions [19], bind synchronization of object features [20],and perception of time in object motion [21], has substantiallyincreased. These illusions and the functional organization of visualsystems are generally understood through temporal limits [22].Multiple temporal resolution is caused by the temporal intervalwithin video frames that can be considered for the perceptionof biological movements. Biological movements are recognized

despite the variability in the conditions of lighting, human objectlocations, or even perspectives in time. Such feature representsan ability for the system [23] and a demand for devising a robusttechnique against the fast-varying parameters. In the primary sen-

5 Soft C

siTdcewrb

iTiadcbtamsf

pdbtisltridmvwttsTbarc

2

stmrecwbaia“eeaT

8 B. Yousefi, C.K. Loo / Applied

ory system, even a minimal signal variation in the human objectn time may generate different stimuli in the recognition results.hus, considering the changes in time, the brain must produceifferent stimuli for representing the underlying cause, therebyreating an inner representation that does not change when irrel-vant variations occur across time. The proposed approach in thisork attempts to answer the question of how such unchanged rep-

esentations can be established and influence the recognition ofiological movements?

Invariant representations can be successfully learned via thenput sensory adaptation to the extraction of slow-varying features.he temporal stability in the slowness principle is fundamental

n the creation of slow features [23–25]. The applications of thispproach have concentrated on visual system models that are pre-ominantly focused on the self-organized configuration of complexell receptive fields in the primary visual cortex [26,27]. Given thatiological movement is an understandable task in the mechanism,he recognition of such actions is related to temporal movementsnd their visual understanding in the human brain. The proposedodification does not only analyze the slow or fast features but also

trives to employ psychophysical and neurophysiological evidenceor showing the independency of the pathways [1,5,28,6,29,30].

The proposed approach applies ABM followed by the slownessrinciple combined with motion information, which provides aifferent perspective toward the original model for recognizingiological movements [1]. The modification in the model occurshrough the slow and fast features in the ventral and dorsal process-ng streams. The new development in the modeling of the ventraltream utilizes the slowness principle for the active basis calcu-ated from the human object form. The two pathways interact inhe decision-making part where significant results in human actionecognition are obtained. However, each pathway separately facil-tates the recognition of biological movements with a considerableisparity rate. One of a concrete evidence for this is the perfor-ances of two patients were analyzed, including DF who developed

isual agnosia (i.e., damage to the ventrolateral occipital) and RVho developed optic ataxia (i.e., damage to the occipitoparietal cor-

ex) [31]. The proposed approach contributes to the integration ofhe slowness principle into the model and the introduction of thelow and fast features concept (static and dynamic patterns [30]).his article is organized as follows. The next section presents theiological motivation. Sections 3 and 4 discuss our methodologynd evaluates the simulation results and accuracy of the system,espectively. Section 5 presents the discussion, while Section 6oncludes the paper.

. Biological motivation

The mechanism for recognizing biological movements can beplit into two pre-processing streams [33–37] that are parallel tohe dorsal and ventral streams, which are specified for analyzing

otion and form information, respectively. Two streams have neu-al motion detectors, and form feature extraction hierarchicallynables an in-dependency in size and style in both pathways and alassification of generated features from both feed-forward path-ays to categorize biological movements. The results for stationary

iological motion recognition indicate that discrimination can beccomplished through small latencies, thereby highlighting themportant role of top-down unlikely signals [34]. The body shapesre determined through a set of patterns similar to sequences ofsnapshot” [1], that maintain constant features throughout the

ntire action episode. The proposed approach expands upon anarlier model used for stationary object recognition [1–4,37,38] bydding and combining the temporal information in the pathways.his method can determine the appropriate quantity of exist-

omputing 62 (2017) 57–72

ing information for organizing, summarizing, and interpreting byreferring to data from neuro-physiological studies. The proposedapproach quantitatively develops the original model for temporalanalysis and computer simulations with respect to the previousmodel architecture. The slow and fast features are not used as cri-teria for measuring speed; rather, they are used to measure therate of variation in the features. The motion pathway representsthe function of the dorsal stream in the mechanism and involvesinformation related to optical flow, which has a fast temporal vari-ation. Fast-varying features (optical flow features) refer to the shortchanges between Frame(t) and Frame(t+1) rather than to the entireepisode (such as the ABM-based IncSFA in the form pathway). Thelocal detector of optical flow is connected to motion patterns, andthe model is composed of a population of directed neurons in thearea of MT. However, MT and V4 have a connection for motionand direction selection. The motion edge selectors in two oppositedirections are present in the areas of MT, MSTd, and MSTl [39,40],the multiple regions of dorsal steams (most likely in the kineticoccipital area (KO) [1] and the motion selective edges that are sim-ilar to MT [39] and MSTl [40] in macaque monkeys. Few plausiblemodels have been proposed for human body shape recognition thatare considered appropriate for the neurophysiological recognitionof the stationary form [37]. The proposed approach uses ABM [13],which inspired from V1-like Gabor filters in the luminance image(in V1) that has a contrasting normalization of filter outputs andsummation of energy (by SUM-MAX in ABM) [41]. The addition ofABM, which is a supervised method that uses Gabor filters, into themechanism [5,28,42,6,29] increases the constancy in simple cells[43]. The complex cells in V1, V2, or V4 are invariant in terms of theposition of varying responses [1] whereas size independency is typ-ically present in V4. V2 and V4 are more selective for difficult formfeatures (e.g., junctions and corners) but are not suitable for motionrecognition because of the temporal dependency in these two path-ways. The snapshot detectors are used to identify the shape modelsthat are similar to the area of the inferotemporal cortex (IT; Fig. 1) ofmonkeys where the view-tuned neurons are located and the modelof complex shapes is tuned [39]. Snapshot neurons are similar tothe view-tuned neurons in the area of IT and provide an indepen-dent scale and position. Previous models have used Gaussian radialbasis functions for modeling and have been adjusted during train-ing where they generate a key frame for the training sequences.This key frame can consider fast features (motion pathways) asit has been defined in time frames rather than episodic changes.Slowness features, which have Gabor-like features, are highly rep-resentative of the form information of biological movements. Theproposed approach strongly suggests that motion contains sponta-neous temporal variations and that form features are temporallyslow-variation-ordered information throughout the movements.The novelty of this approach is more pronounced in computationalintelligence and soft computing than in other relevant fields. Ourdevelopment in the model follows the previous models and modi-fies the form pathway by applying ABM-based IncSFA as explainedin the method Section. The computational simulations and testingmethod are presented in the results Section. The biological motionperception in the human visual system highlights an associationbetween fast and slow features, which allows the recognition ofbiological movements.

3. Method

The proposed approach modifies the recognition mechanism of

a biologically inspired model [1] in the visual system and aggregatestwo types of information, namely, form information (slow features)and motion information (fast features). In the experiment, approxi-mately 38,000 cuboid frames of human movements were analyzed

B. Yousefi, C.K. Loo / Applied Soft Computing 62 (2017) 57–72 59

Fig. 1. Overall structure of the visual system analytical models is depicted here. The hierarchical model follows the original model; the interpretation of the data reflectsthe perspective of a combination of slowness and fast features provided from the ventral and dorsal processing streams. The various types of neural detectors in differentsections of the hierarchy are shown. V1 and IT represent the primary visual cortex and the inferotemporal cortex, respectively; KO and STS represent the kinetic occipitalcortex and the superior temporal sulcus, respectively. These and other abbreviations indicate the visual cortex in monkeys and human [1]. The schematic of the model ispresented here for both pathways. In the ventral processing stream, i.e., the form pathway, a set of Gabor filters has been applied at different orientations, positions andphases; the outputs of the V1 are the outcomes of quadrature-phase pairs, summed, squared and square-rooted. Then, the outputs of the filter are normalized with regard tothe local population. The filter outcomes are subsequently max-pooled and summed across the space. The MAX and SUM operations are based on the attained active basis ofthe object form. This initial part of the schematic occurs through the ABM [13] as a Gabor-based object recognition operation. Finally, the output of the ABM is gone throughI elps

m hin tho wo pa

bespaeotdtiei

3

tsawlt

ncSFA [32,14] for slow features of basis extraction. The dorsal processing stream hotion pathway is attained by optical flow. The average of these flow divisions wit

wn decision with regard to categorization, and it justified by the performance of t

y using the proposed model. The response of each input wasstimated throughout the time series data by applying IncSFA forlowness and combining these data with motion information. Theroposed method contributes to the fields of computational visionnd intelligence. However, the biological concept and some relatedvidence have vital roles in modifying the computational part ofur methodology due to the link between the biological visual sys-em and the proposed mechanism. After discussing ABM, IncSFA isescribed along with the method for merging ABM and IncSFA inhe form pathway. The optical flow and its division approach for thenteraction between pathways are explained following the kernelxtreme learning machine (ELM) brief for the decision-making stepn the recognition process.

.1. Active basis model

The ABM [13] uses the Gabor wavelets (for the elements dic-ionary) consisting of a deformable biological template. A sharedketch algorithm (SSA) is followed through AdaBoost. In each iter-

tion, SSA follows the matching pursuit chooses an element of theavelet. It checks the object’s number in different orientations,

ocations and scales. By selecting a small number of elements fromhe dictionary for every image (Sparse coding), there can be a rep-

to obtain motion information throughout the high-dimensional input stream. Thee episode (t0, t1, . . ., tn) plays the fast feature. However, each pathway can have itstients (DF and RV) [31].

resentation of the image using a linear combination of previouslydescribed elements by considering U as a minor residual.

I =n∑

i=1

ciˇi + � (1)

Where = (ˇi, i = 1, . . ., n) is a set of Gabor Wavelet elements andcomponents of sinusoid, ci = 〈I, ˇi〉 and � is unsolved image coeffi-cient [13]. Using wavelet sparse coding a large number of pixelsreduces to a small number of wavelet elements. Sparse coding cantrain natural patches of an image to a Gabor like wavelet elementdictionary, which carries the simple cells in V1 properties [44]. Theextraction of local shapes is separately performed for every frame[13] and filter computes the images considering the orientation anddensity. Also, ABM uses the Gabor filter bank but in a different form.A Gabor wavelets dictionary, comprising n directions and m scales

is in the form of, GWj(�, ω), j = 1, . . ., m × n. Where, � ∈ { k�n , k =0, . . ., n − 1} and ω = {

√2i , i = 1, . . ., m}. The object forms contain

the small variance in size, location and posture. Though overallshape structure, it is considered to be maintained throughout the

6 Soft C

po

B

G[meitdsfi[faft

M

Mtltbi

M

Momfm

3

sweG[

wGv

widdebblk

B

0 B. Yousefi, C.K. Loo / Applied

rocess of recognition. Response (convolution) to each elementffers form information with � and ω.

= 〈GW, I〉 =∑∑

GW(x0 − x, y0 − y : ω0, �0)I(x, y).

Wj is a [xg, yg], I is the [xi, yi] matrix, and the response of I to GW isxi + xg, yi + yg]. Therefore, the previous convolution of both matrices

ust be padded through sufficient zeroes. Cropping the result canliminate the consequences of convolution. An additional approachs to shift the center of the frequencies (zero frequency) back tohe center of the image, although this process might result in lostata. For the obtained training image set {Im, m = 1, . . ., M}, the jointketch algorithm consecutively chooses Bi. It fundamentally identi-es Bi so that its edge segments obtained from Im become maximal

45]. It is then necessary to compute [Im. ˇ] = | 〈Im. ˇ〉 | 2 for dif-erent i where ∈ Dictionary and represents sigmoid, whiteningnd thresholding transformations. Then,[Im. ˇ] will be maximizedor all possible ˇ. Let = (ˇi, i = 1, . . ., n) be the template; for everyraining image, Im scoring will be based on:

(Im, �) =n∑

i=1

ıi | Im, | − log ˚(�ıi). (2)

is the match scoring function, ıi from∑M

n=1[Im, ˇ] addresseshe steps selection and is a nonlinear function. The logarithmicikelihood relationship of the exponential model was attained fromhe template matching score. Vectors of the weight were calculatedy the maximum likelihood technique and are revealed by � = (ıi,

= 1, . . ., n) [45].

ax(x, y) = max(x,y) ∈ D

M(Im, ˇ). (3)

ax(x, y) calculates the maximum matching score previouslybtained and D represents the lattice of I. Here, there is no sum-ation because of updating the size based on the training system

or frame (t − 1). Moreover, the method tracks the object applyingotion feature for obtaining displacement of the moving object.

.2. ABM and Gabor kernel

ABM is a supervised learning Gabor wavelet model that has beenuccessfully used for object recognition tasks [13]. Previously Gaboravelet has been widely used as a kernel in various applications,

.g., face recognition [47] or even action recognition model [2–4].abor wavelet (kernels filter) has been defined in previous works

47–50] as (Fig. 3):

�,(z) = ||�,||2�2

e||�, ||2||z||2/2�2

ei�,z − e−(�2/2) (4)

here � and are the orientation and scale, respectively, of theabor kernels, z = (x, y), ||. || is the norm operator, and the waveector �, is defined as follows:

�,, z − e||2 = e

i�� (5)

here = max/f and �� = ��/8. max is the frequency of the max-mum, and f is the spacing factor between kernels in the frequencyomain [51]. Unlike the Gabor wavelet kernel that was required toefine the scales and orientation of the Gabor beams, these param-ters in ABM attains by training. It represents the image obtainedy the summation of the active basis families, which are obtainedy dictionary and match scoring function. Let I(x, y) be the gray

evel distribution of an image; the image convolution I and a Gaborernel � are defined as follows:

zi,�, = �,(Z) ∗ I(Z) i = 1, 2, . . ., n (6)

omputing 62 (2017) 57–72

where z = (x, y) denotes the convolution operator, and Bzi,�, is theactive basis that corresponds to match scoring at the proper orien-tation and scale. Consequently, the set S = {Bzi,�, : � ∈ M, ∈ O}forms the Gabor wavelet representation of the image I(Z) along withM and O which represents the orientations and scales of the Gaborwavelet dictionary. To include the various spatial localities, spatialfrequencies (scales) and orientation selectivity, we concentratedon all depiction results and obtained a supplemented feature vec-tor X. X defines a set of active basis that have the highest matchingscores based on the training sets that were used together to makethe object form. This method prefers the integration of simple cellsto make complex cells (Fig. 5).

3.3. Slowness principle for ventral processing streams

The perception of SFA relates the high-dimensional input data(2D signal sequence) and an abstraction of the entire stream to animage [52]. The input signal usually shows some variations in theenvironment with varying lighting conditions or noise levels, andthe extraction of information from the sequence seems to be a chal-lenging task. Slowness features can extract video attributes thatslightly vary over time. These features have been recently enteredin computer vision [52,32,53] and are typically connected to thevisual cortex [54,55]. An incremental learning algorithm is used dueto the application of SFA for each time step in an unknown videoinput. The incremental principal component analysis (IncPCA) isclosely related to IncSFA [41,44] because the latter applies PCA andminor component analysis (MCA). Slowness features contain infor-mation regarding the active basis from multidimensional input,while fast features can recognize the biological movement task.However, the active basis that is extracted by ABM and IncSFAonly summarizes the sequence. SFA provides instantaneous scalarinput-output functions that generate a 2D signal output which car-ries important information that changes as slowly as possible. SFAis an unsupervised learning method that extracts slowly changingfeatures. This method provides a characteristic representation ofthe entire set and eliminates the unrelated details selected by thesensors [32,56,54]. Given that a high-dimensional video is used asinput, an otherwise stationary room may be searched and the datamay be encoded by combining the situation and direction withslow features [57]. SFA is typically concerned with the optimiza-tion of complexity: it is common that for the identification of x(t)as an input by the D dimension, x(t) = [x1(t), . . ., xD(t)], there is a setof functions similar to f(x) that have L dimension, g(t) = [g1(t), . . .,gL(t)], or that can produce the output for L dimension as y(t) so thaty(t) = [y1(t), . . ., yL(t)]. Thus, the relationship between these sets isyl(t) := gl(t).

�l:=�(yl):=〈y2l 〉. is minimal (7)

〈yl〉 = 0. (Zero mean), (8)

〈y2l 〉 = 1. (Unit variance), (9)

∀d < 1 : 〈ydyl〉 = 0. (De − correlation and order), (10)

The general definitions, similar to (8) and (9), are the restric-tions for having insignificant constants in the output, and (10) isfor de-correlation restrictions for features that are the same. Arepresentation of the evaluation for the derivative of y and thesequential average are considered, correspondingly. The problemwill be defined by identifying the f(x) for generating the slow vary-

ing output. It is noticeable that for the solution of this problem, theoptimization of variation calculus, similar to [19] is not applicable,but it is simple especially for the eigenvector method. Consideringthat fl is constrained to be a linear function that consists of a com-

B. Yousefi, C.K. Loo / Applied Soft Computing 62 (2017) 57–72 61

Fig. 2. A different approach to present the hierarchical model in terms of the theory and computation of the form (slow features) and motion (fast features) information areshown. ABM provides this property with human objects and the computation of slow features are performed with IncSFA, which is episodic approach and provides slowfeatures of basis within the episode that were combined with optical flow division information. It creates an interaction between the pathways.

Fig. 3. The qualitative comparison of functional imaging experiments to ABM is revealed in the figure. The biological movements according to the research experiments ofJ d pere o the po of be

bw

y

Tii

Imoi

ohnsson [46]: 10 light bulbs were located on the joints, and the actor was recordepisodes. In addition, the dots were spontaneously interpreted as a human. Similar tf biological movements, and adding it into IncSFA can be a strong tool for selection

ination of a finite set of nonlinear functions p, the output functionill be:

l(t) = fl(x(t)) = wTdp(x(t)). (11)

hen, we will have z(t) = p(x(t)). Based on the changes previouslyncorporated, the optimization problem will be introduced by min-mizing (12), the wl .

(yl) = 〈y2l 〉 = wTl 〈zzT 〉wl. (12)

f the p functions are selected such that z has a unit of covarianceatrix and a zero mean, then three restrictions will be satisfied,

nly if the weight vectors have an orthonormal difference. Whiten-ng is a very common technique that is used for identifying p. For

forming complex movements. There is recognition of the action within the actionoint light technique, which presents static pictures, ABM has a good representationst representative of basis in form pathway.

whitening, the PCA of the input data is required to be calculated;considering the zero mean and the individuality covariance matrix,put the x to z and by this z, SFA problem will be converted to thelinear problem. Eq. (12) should be considered for minimizing theL-normed set of eigenvectors of 〈zzT 〉. The desired features will beobtained from the set of PCs of z. The objective was to calculatethe temporal slowness, ı-value, features and (x) as instantaneousfunctions of the input 2D-signal. This eigenvector-based algorithmis guaranteed to obtain the global optimum and learn biologically

plausible rules for the existing optimization problem [54,58,59].The modified optimization problem for high-dimensional visualinput uses the information of biological movements and the human

6 Soft C

ow

3

taswtSkAt

3b

tmina

cmaatw[Isfstm

E

WfiTmonfltdatrvlthasbefi

2 B. Yousefi, C.K. Loo / Applied

bject through ABM. This pathway information is then combinedith fast features with respect to the original model [1–4].

.4. ABM-based IncSFA

SFA is a non-linear decomposition method that is solved bywofold PCA and is entirely based on second-order statistics. Braynd Martinez [60] presented the kernel based on the temporallowness principle by using the objective function of Stone [61],hich is, in some ways, dissimilar to SFA. The SFA solution is applied

hrough PCA and MCA and is closely related to IncPCA [45,62,32].chölkopf et al. [63] proposed a kernel-PCA and proposed that theernelled IncPCA must be considered in the case of IncSFA [64].BM can be applied as a subset of the Gabor filter for IncSFA to find

he best representative basis of each action.

.5. Motion information in the dorsal stream and interactionetween pathways

In the motion pathway, biological movements are recognized byhe dynamic patterns of optical flow. The optical flow identifies the

ovement pattern, which is consistent with the neurophysiolog-cal information from the hierarchy of neural detectors [1]. Someeurons for motion and direction selection can be found in the MTnd V1 areas, respectively.

This motion information extracted by applying layer-wise opti-al flow estimation. A mask that reveals the visibility of each layerainly accounts for the difference in the estimations of traditional

nd layer-wise optical flows. The mask shape may be fractal andrbitrary, and matching only applies for those pixels that fall insidehe mask [65]. We use the layer-wise optical flow method in [52]hich has a previously described baseline optical flow algorithm

33–35]. M1 and M2 are visible masks for two frames I1(t) and2(t − 1), and the field of flow from I1 to I2 and I2 to I1 are repre-ent by (u1, v1), (u2, v2). The following terms will be consideredor layer-wise optical flow estimation. The objective function con-ists of summing three parts; visible layer masks then match tohese two images using a Gaussian filter and are called data term

atching E(i) , symmetric E(i)

ı,and smoothness E(i)

� .

(u1, v1, u2, v2) =2∑

i=1

E(i) + �E(i)

ı+ �E(i)

� . (13)

e achieved a bidirectional flow after optimizing the objectiveunction, using outer and inner fixed-point iterations, applyingmage warping, and performing a coarse-to-fine search process.he compressed optic flow for all frames was calculated by straightatching the template to the earlier frame through the summation

f absolute difference (l1-norm). Although optic flow is particularlyoisy, no smoothing techniques have been performed using thisow because the field of flow becomes blurred in gaps, especially inhose locations where the motion information is significant [32]. Toetermine the proper response of the optical flow with regard to itspplication in the proposed model, the optical flow will be appliedo adjust ABM and increase its efficiency. To achieve a reliable rep-esentation through the form pathway, the optic flow estimates theelocity and flow direction. The response of the filter based on theocal matching of velocity and direction will be maximal as thesewo parameters are continuously changing. Optical flow divisionas been used for analyzing the interaction between the motionnd form information [6]. The information of the optical flow divi-

ion is also added to the representative of each action as calculatedy ABM-based IncSFA. Such interaction confirms both pathways inach biological movement before this information is applied in thenal decision making procedure using ELM.

omputing 62 (2017) 57–72

3.6. Extreme learning machine (ELM)

Artificial neural networks (ANN) have been widely utilized inseveral research areas because of their capability to estimate diffi-cult nonlinear mappings straight from the input sample and theirability to offer models for a large class of artificial and naturalphenomena that are difficult to model using classical parametrictechniques. Huang et al. (2004) proposed a novel algorithm forlearning, namely, a single layer feed-forward neural network struc-ture called ELM [66–68]. This algorithm solves those problems thatare initiated by those algorithms that use gradient descent (e.g.,the back propagation used in ANNs). ELM considerably diminishesthe amount of time required for training in the neural network,shows a faster learning and generalization performance, requiresfewer human interventions, and can run significantly faster com-pared with conventional techniques. ELM also routinely concludesthe parameters of the entire network, thereby preventing unimpor-tant external interventions from humans, and is highly effective inreal-time applications. Several advantages of ELM include its simpleusage, fast learning speed, excellent generalization performance,appropriateness for several nonlinear kernel functions, and acti-vation function [69]. The single hidden layer feed-forward neuralnetwork (SLFN) function with hidden nodes [70,71] can be reflectedon the mathematical explanation of SLFN, which integrates additiveand Sigmoid hidden nodes together in a joint method.

fL(x) =L∑

i=1

ˇiG(s1, bi, x) x ∈ Rn, ai ∈ R

n (14)

Let ai and bi represent the parameters of learning in hidden nodesand ˇi represent the connecting weight of the ith output node of thehidden node. G(s1, bi, x) is the output of the ith hidden node withrespect to the input x. For the additive hidden node with the activa-tion function G(x) : R → R (e.g., sigmoid and threshold), G(s1, bi, x)is provided by G(ai, bi, x) = g(ai, x + bi) bi ∈ R. Let ai represent theconnecting weight vector of the input layer to the ith hidden nodeand bi represent the ith hidden node bias. For N, arbitrary differ-ent examples are indicated by (xi, ti) ∈ Rnm. Now,xi is a n vector ofcontribution, and ti in a m vector of target. If the SLFN by L hiddennodes can be estimated, then these N samples have zero error. Thisrelationship infers a ˇi, ai and bi such that

fL(xj) =i=1∑

L

ˇiG(ai, bi, x) j = 1, 2, . . ., N. (15)

The equation above is described in a compacted form as follows:

H = T (16)

where H(a, b, x) with a = a1, . . ., aL; b = b1, . . ., bL; x = x1, . . ., xN , = [ˇT1, . . .ˇTL ]

T

L×m, and T = [tT1 , . . .tTL ]T

N×m. Let H represent the hid-den layer of the SLFN output matrix, with the ith column of H beingthe ith hidden node’s output with respect to the inputs x1, x2, . . .,xN. can be calculated by = H*T where H* is the Moore-Penrosegeneralized inverse of matrix H [72].

4. Results

Through computer programming and simulations, the proposedapproach applies the detailed theoretical framework to severalmovement patterns in different environments that resemble typi-cal experimental settings for system benchmarking. In the datasets

section, two human movement datasets are introduced as a diversebiological movement paradigm. The simulation results are obtainedby testing two datasets for benchmarking. The results of the sim-ulation for generating the slowest features in the ventral streams

Soft C

amriamlc

4

atgbwitwfftaAadtadnhcptftrgfhttAtlrTstswWtitfcuwLcscA6

B. Yousefi, C.K. Loo / Applied

nd those of the combination technique for recognizing biologicalovements are also presented. The following sections present the

esults for recognition accuracy and confusion matrices. The exper-mental results are extensively discussed to reveal the effectivenessnd to understand the perspective of the biological movementsodel. The performance of the model in recognizing different bio-

ogical movements is evaluated through model simulations and byomparing this model with other state-of-the-art methods.

.1. Datasets

KTH action dataset [73] is the largest human action datasetnd includes 598 action sequences, which are composed of sixypes of single individual actions, including boxing, clapping, jog-ing, running, walking and waving. These actions were performedy 25 individuals in different conditions: outdoors (s1), outdoorsith scale variations (s2), outdoors with different clothes (s3), and

ndoors with lighting variations (s4). Here, using down sampling,he sequence resolutions became 200 × 142 pixels. In our approach,e used 5 random cases (subjects) for training and designing the

orm and predefined templates. It means for every action, five dif-erent cases chosen to create action prototypes [5,29]. It provideshe possibility to represent an action by only one image-templatend ultimately template matching among the different actions.s discussed in the literature, KTH is a robust intra-subject vari-tion with a large dataset, whereas the camera for video recordinguring the preparation had some shacking, making working withhis database very difficult. In addition, it has four scenarios thatre independent, separately trained and tested (i.e., four visuallyifferent databases that share the same classes) and both alter-atives have been run. For considering the symmetry problem ofuman actions, a mirror function for sequences along the verti-al axis was available for the testing and training sets. Here, allossible overlaps between human actions within the training andesting sets were considered (e.g., one video had 32 and 24 actionrames.) Weizmann human action database [74] comprises nineypes of single individual actions and had 83 video streams thatevealed individuals performing nine different actions: running,alloping sideways, jumping in place on two legs, walking, per-orming jumping jacks, jumping forward on two legs, waving oneand, waving two hands, and bending. We tracked and stabilizedhe figures using the background subtraction masks that came withhis dataset. A sample frame of this dataset is shown in Fig. 4.forementioned datasets have been extensively used to estimate

he performance of the proposed techniques at recognizing bio-ogical movement examples. However they were concentrated onecognizing a single individuals action (e.g., clapping or walking).he other advantage of using these datasets is to compare with thetate-of-the-art approaches. For our testing datasets, we illustratedhe experimental results using a kernelled ELM algorithm to clas-ify into the different kernel modes and performed a comparisonith previous work that proposed biological human action models.e also compared the classifications between various kernels in

he form pathway in term of their accuracy. The proposed methods not efficient in term of the computational cost. The computa-ional load is usually due to feature extraction in two pathwaysorm and motion features, applying ABM-based IncSFA and opti-al flow. The system infers that a new video requires time in ourn-optimized MATLAB implementation, in which it is combinedith existing codes for the motion and form pathway in MAT-

AB/C [32,13,70,71,65,24,75–77]. For providing an estimation of theomputational time with an i7, 2.80 GHz CPU, 64-bites operating

ystem machine with 12 GB memory(RAM) and our un-optimizedodes under MATLAB R2016a, optical flow took around 0.7 s andBM and IncSFA took around 1 and 0.02 second for 200 × 142 and4 × 64 pixels input image resolution, respectively. It is noticeable

omputing 62 (2017) 57–72 63

that the aforementioned times are estimated for only one cycle ofthe approaches for the paths (Fig. 6).

4.2. Results of the simulation for the slowness features in the formpathway

The results for the ventral stream must be oriented around theconcepts of shape and form features. This orientation follows theoriginal biological movement model [1] and many approaches thatare based on the visual system.

The recognition of biological movement patterns in the formpathway depends on the slow features generated by IncSFA. Theslowest features of the training set have been used as human actionprototypes [5]. We generated multi-prototype predefined tem-plates for each obtained human action by applying IncSFA to thedatasets. We divided every human movement sequence into train-ing (55% of data) and testing (45% of data) sets. We used five randomsubjects from the training set for each action to obtain the actionprototypes. These action prototypes are considered preventativefor different biological movements [5,29].

The slowness features that are obtained through the applica-tion of IncSFA are shown in Fig. 8. For each dataset, two categoriesof biological movements are identified, using different slownessprototypes is required, and the actions are not directly compara-ble. KTH and Weizmann human action databases were used forbenchmarking the performance of the proposed approach; thus,our experiment process should be consistent with that presentedin [2,53,73]. We defined the set of our training map and tested theperformance of our proposed technique for each dataset in whichfour scenarios in videos were used together (for the KTH dataset).The datasets were split into a set of training maps (from 55% ofthe dataset) with randomly selected subjects (five subjects) anda test portion with residual subjects (whatever remained). Thesetwo sets did not overlap. IncSFA was then applied to the trainingsets and generated slow feature prototypes that played a role inthe form movement templates. We did not mainly depend on thetraining set as we applied slow feature prototypes. The ELM wastrained and tested based on the splitting of datasets into 55% and45% non-overlapping sets. The prototype selection was performedon the training set. The SFA in the ventral stream prevents us fromusing complex computer vision techniques, such as “bag of words.”The proposed approach involved the extraction of slowest featuresfrom a set of image frames for every action. The ABM-based IncSFAin the model performed more successfully compared with otherstate-of-the-art computational methods. This approach also greatlyoutperformed the other methods, such as “bag of word” and actionkey, both of which might generate inaccurate results. This methodconsiders a set of patches that are locally selected and may ignoremany structures; therefore, this technique has been acknowledgedas a well-performing efficient object recognition method (Figs. 8and 9). The local distribution of the action sequence was very simi-lar to the targeted action and differed from other sample sequencestaken from different categories. The intraclass variance was gen-erally large except when recognizing a single human action [53].Consequently, the model showed a fair performance in the recog-nition of biological movements.

4.3. Simulation results for motion pathway

To implement the motion pathway, the proposed techniqueapplied a common and noticeable tool, optical flow [65], to generatemotion pathway information that is estimated by analyzing optical

flow patterns [1]. This pathway contains the neural detector orderfor motion information and optical flow features are providing suchinformation in our computational model, which is followed by neu-rophysiological data [78]. From a temporal variation perspective,

64 B. Yousefi, C.K. Loo / Applied Soft Computing 62 (2017) 57–72

Fig. 4. The figure depicts Weizmann and KTH human action datasets. To test the recognition of biological movements, two well-known human action recognition datasetswere used. Here, the left set of image samples demonstrate actions from the Weizmann dataset; the second set, right-hand side, shows the KTH human action dataset. It isnoticeable that the KTH dataset is one of the largest human action datasets, including six various human actions in four different scenarios.

Fig. 5. An explanatory diagram of ABM in the ventral processing [13] which represents the moving patterns and basis within the movement is shown in the figure. It showsthe application of Gabor filter into a supervised object recognition method. It learns the object shape in the training stage within the action episode. (a) Representation ofthe Gabor bank filter in different scales and orientations. (b) Simulation results of training the ABM system for biological walking movements using the KTH human actionr gure.p evels hw he ent

tweeftmnltvmtbst

ecognition dataset. At the end, the walker’s shape is presented at the top of the firesented. The similarities between the method and biological findings at different lhich constructs the hierarchy from simple cells to complex cells, and at the end, t

he motion information here is considered a fast feature throughhich biological movements occur. These features were not gen-

rated as constant representative features throughout the wholepisode but were focused at the temporal order within the currentrame. In contrast to the form pathway, the motion pathway hasemporal features, with each feature representing the motion infor-

ation in a specific temporal movement. The optical flow patterneurons were generated during action cycles and modeled simi-

arly to the form pattern neurons from the form pathway. Thesewo types of neurons primarily differed in terms of their temporalariations. The average optical flow throughout a biological move-ent is taken as a fast feature and added in the form of a coefficient

o the form pathway results that are generated by the interactionetween the ventral and dorsal streams (combining the fast andlow features). Fig. 7 shows the motion pattern features throughouthe action cycles that are integrated into the processing stream.

(c) The processing diagram of the ABM process for identifying the human objectave been discussed for different stages. In overall, ABM has SUM and MAX operandsire human object shape by active basis.

4.4. Evaluation of the interaction between two pathways

To analytically assess the model, more than 38,000 frame cuboidforms of different human biological action movements were pre-pared. Given that the human movement recognition model showsan episodic behavior due to its utilization of slowness features(IncSFA provides a better condition), using this algorithm for train-ing analysis required many input frames and the testing andevaluation of the model were very limited. The proposed modelswere evaluated based on the level of predictions that were matchedwith the data. The proposed model assumes that the original mod-els focus on slowness features and episodic processing. However,

one cannot guarantee that the proposed model will reach a com-pletely accurate level. The system was then trained and tested bycomputing the features in both pathways based on the previouslydescribed settings. For a specified test sequence, the action label

B. Yousefi, C.K. Loo / Applied Soft Computing 62 (2017) 57–72 65

Fig. 6. Simulation results for a simple biological movement paradigm based on ABM-based IncSFA in the ventral processing stream are shown. Each row within the panelreveals the response of ABM during the episode as well as the slowness features generated for each different action. The first set of biological movements was obtained fromthe Weizmann human action recognition dataset [74], and the second group of biological movements are the results for KTH dataset [73]. Simulation results for the activeb tion os al mo

wcm“

asis through IncSFA follow the theoretical prediction with regard to the simplificatream for opening a new perspective for the mechanism for recognition of biologic

as assigned to the action frames. The proposed model correctlylassified the majority of the actions (mentioned in the confusionatrices). The most frequent mistakes were found in distinguishing

running” from “jogging” and in distinguishing “boxing”, “clap-

f recognition using ABM-based IncSFA and its application in the ventral processingvements.

ping,” and “waving” [2,4,53]. Such difficulty can be attributed tothe similarities among these movements, but the presented modeldramatically diminishes such similarities via episodic learning. Theresults of each human movement scenario are presented in Tables 1

66 B. Yousefi, C.K. Loo / Applied Soft Computing 62 (2017) 57–72

Fig. 7. Simulation results with regard to the dorsal processing stream by applying optical flow [24] are shown. As an episodic operation occurs within the ventral processingstream, the form and motion information must be considered during the time that the ventral stream was performing (t0, tn). Each row represents an action. The imagespresent in color form, which can depict the flow in false colors. For interaction between the ventral and dorsal streams optical flow division has being used [6]. The differentactions of these stimulation are presented from the KTH human action recognition dataset [73]. (For interpretation of the references to color in this figure legend, the readeris referred to the web version of this article.)

Fig. 8. Confusion matrices of the proposed approach are presented and were obtained from the human action movements of the KTH dataset [73]. Three different kernelsw catew d; them

adtpstsooetw

ere for classification using the ELM algorithm [70,71] in the decision-making andavelet kernel ELM, and the Sigmoid-ELM confusion matrices have been depicteovements.

nd 2 . Unlike that of previous methods that utilized the sameatasets, the accuracy of the proposed approach was reflected onhe results of each human movement scenario. However, this com-arison is not precise because of differences in the experimentaletups. These results are comparable with those of other state-of-he-art approaches despite the differences in their methodologies,uch as being supervised or unsupervised, the presence or absencef tracking, the subtraction of the background, or the consideration

f multiple action recognition. The proposed approach was thenvaluated using two human action datasets. Figs. 8 and 9 presenthe confusion matrices. The performance of the proposed modelas compared with that of previous approaches that utilized the

gorization of the biological movement. From left to right, the RBF kernel-ELM, the Sigmoid Kernel-ELM produced better accuracy for the classification of biological

same dataset presented in Tables 1 and 2 . The different meth-ods listed in Table 1 show many variations in the experimentalsetups, such as in the partitions of training and testing data, thepre-processing procedure (e.g., tracking and background subtrac-tion), the presence or absence of supervision, and the per-frameclassification performance. The results of our model were more sta-ble than those of other models due to its adoption of the slownessprinciple and its combination with fast features. However, the com-

parisons of our model with other methods are not strictly fair eventhough the other methods do not completely cover the biologicalperspective [73] (Fig. 10).

B. Yousefi, C.K. Loo / Applied Soft Computing 62 (2017) 57–72 67

Fig. 9. Confusion matrices of the proposed approach are presented and were obtained from the human action movements of the Weizmann dataset [74]. Three differentkernels were used to classify using the ELM algorithm [70,71] in the decision-making and categorization of the biological movement. From left to right, the RBF kernel ELM,the wavelet kernel ELM and the Sigmoid ELM confusion matrices have been depicted; the RBF and Sigmoid Kernel ELM produced better results, 97.5%, in the classificationof biological movements.

Fig. 10. Confusion matrices for the two pathways separately represent the accuracy of each processing stream when considering no interaction between the pathways. Thisrepresentation was obtained from KTH human action dataset for benchmarking without interaction of the paths. It inspired from two patients, DF who developed visualagnosia (i.e., damage to the ventrolateral occipital cortex) and RV who developed optic ataxia (i.e., damage to the occipitoparietal cortex) [31].

Table 1The recognition results using the proposed method are compared with state-of-the-art methods that utilized the KTH action dataset.

Methods Accuracy Years

Schuldt et al. [73] 71.72% 2004Niebles et al. [79] 83.33% 2008Schindler and Van Kool [2] 92.7% 2008Wang and Mori [80] 91.2% 2009Danafar et al. [4] 93.1% 2010Zhang and Tao [53] U-SFA:84.67%

S-SFA:88.83%D-SFA:91.17% 2012SD-SFA:93.50%

5

5

tO

Table 2Comparison of the proposed approach and previous methods that utilized the Weiz-mann human action dataset.

Methods Accuracy Years

Schuldt et al. [73] 72.8% 2004Niebles et al. [79] 72.8% 2008Schindler and Van Kool [2] 100% 2008Wang and Mori [80] 100% 2009Zhang and Tao [53] U-SFA:86.67%

S-SFA:86.40%D-SFA:89.33% 2012SD-SFA:93.87%

Babu et al. [18] 90.44% 2015Proposed Method 90.07% 2017

. Discussion

.1. ABM and Gabor kernel in form pathway

ABM is motivated to apply the Olshausen and Fields representa-ion [44] to model image object category collections. Although thelshausen and Fields model was proposed to explain the role of

Babu et al. [18] 85.56 % 2015Proposed Method 97.5% 2017

simple cells in the primary visual cortex (V1) (Fig. 5), Riesenhuberand Poggios theory [37] suggests that the local maximum poolingof simple cell responses is performed in V1 complex cells. Thus, thelocal perturbations for the orientations and locations of linear basiselements in the Olshausen and Field model can be derived in the

form of a deformable template from the active basis [81]. The localmaximum pooling by Riesenhuber and Poggios serves as an activebasis for deforming image data. Multiple active bases are used for

6 Soft C

ha

pctsiorpeclitt[sbwbsiowvoAniskkd

5

itfisastfatdmvseSoocprfepat

8 B. Yousefi, C.K. Loo / Applied

ighly articulate shape representations, such as an and-or graph in compositional framework [13,82].

The Gabor wavelets model is very similar to the receptive fieldrofiles of cortical simple cells [63]. Kernel PCA previously over-ame some restrictions in its linear characteristics by nonlinearlyransferring to a space of high-dimensional features from the inputpace. Kernel PCA derives a low-dimensional feature space ands considered nonlinear in the input space [63]. This techniqueriginates from the theorem of Covers regarding pattern sepa-ation in the input space, which posits that nonlinear separableatterns are linearly distinguishable if the input space is nonlin-arly converted into a high-dimensional feature space. From theomputational perspective, kernel PCA follows the Mercer equiva-ence condition because in the input space return the inner productsn the high-dimensional feature space; the complexity of compu-ation is related more to the number of training samples than tohe feature space dimension [47]. We treat the application of ABM5,28,6,29] as a subset of the Gabor wavelet kernel for IncSFA. Achematic model of the visual cortex motivated by the inferenceehind this model is shown in Fig. 2. A set of Gabor wavelet filtersith various phases, orientations, and positions is initially filtered

y the input stimulus, the quadrature-phase output is squared,ummed, square-rooted (energy of V1), and divided, and normal-zation and summation are performed across orientations [41]. Theperations of ABM are almost similar to and significantly consistentith this procedure. Given the recognition property of the super-

ised learning object, the ABM can identify and focus on humanbjects more precisely and robustly than the Gabor wavelet filter.BM was proposed to analyze the images more biologically sig-ificant than Gabor filter itself [47,43,49,83–85]. The role of ABM

s similar to that of the 2D receptive field in mammalian corticalimple and complex cells. The orientation and selectivity of theseernels exhibit desirable characteristics of spatial locality. Theseernels are also spatially localized in the optimal positions andomains of frequency.

.2. Static and dynamic patterns

How does the proposed model with slow and fast process-ng streams recognize biological movement? The results suggesthat ABM-based IncSFA combines form with motion (i.e., slow andast information). The temporal features were calculated by twondependent pathways that represent slow and fast features ortatic and dynamic patterns. Therefore, the proposed model offers

different perspective of biological movements [8] by gatheringensory information over diverse time scales [86]. Consequently,he recognition task follows the original model (which shows aair performance in the targeted databases [2–4]). The proposedpproach is significantly motivated by the holonomic theory (par-icularly form pathway) of Pribram that is based on the fact that theendritic receptive fields in sensory cortices are described mathe-atically by Gabor functions [87] that are widely used by ABM. The

isual information is incrementally treated in a series of corticaltages (e.g., motion and orientation as local features in neurons atach level, such as V1 [88]). ABM-based IncSFA [13,14,32] (multipleFA methods) extracts slow features for the incremental modelingf the form processing streams in the ventral stream. As previ-usly discussed, the primary stage involves the use of local (in V1ells) and model detectors (Gabor-like filters) in 16 (including 8referred) orientations, and using the proper scale depends on theeceptive field [2,39]. However, an invariant behavior is referredrom the neuron response of the central nervous system [23] (e.g.,

arly vision complex cells phase invariance [88] in hippocampallace head direction invariance cells [89]). Human action cyclesre unlikely to have invariant poses that are independent fromhe environment, different lighting conditions, or action poses.

omputing 62 (2017) 57–72

These invariant forms of actions can serve as important criteria thatrepresent the form processing stream information. The slownessprinciple applies the perception and inferences of neurons that aretrained by these invariant actions by favoring the slowly changingoutputs in 2D [89]. A good implementation of this principle is SFA,which represents the mean square from the temporal derivative ofthe output and serves as a fine method for identifying the phys-iological properties of complex cells in the visual cortex [26,55].IncSFA considers the similarities between the view-tuned neuronsin area IT and the snapshot neurons with regard to the indepen-dency of their scale and form. The proposed approach is modeledthrough the slowest feature of the ventral stream, and such featureis adjusted by using unsupervised learning methods. Fast featuresuse and infer the optical flow by using the slowness informationof the other pathway to achieve a high-level integration of neuronmotion pattern information into the snapshot neuron outcomes.

Based on several proposed neurophysiologically plausible mod-els for approximating local motion [100–103,92,1], the first levelof the motion pathway comprises the correspondence detectorsfor local motion, which includes direction-selective neurons [91]and motion-selective neurons [104]. In the simulation stage, thetemporal optical flow patterns were directly calculated and themotion-sensitive neuron responses were computed by realisticphysiological parameters [104] with several size of the receptivefield in different levels. The second level of the motion pathway hadlarger receptive fields for the local flow structure, which inducesthe movement of stimuli (more neurophysiological evident men-tioned in Table 3). The selective flow translation and neurons ofmotion pattern correspond to bandpass or low tuning by consid-ering their speed [92]. In the original model, four direction neuronpopulations are typically preferred, and local optical flow detectorsare considered for the motion edges [1]. The output signals werecalculated by combining two nearby sub-fields with contradictorypreferred directions. The optical flow pattern neurons in the thirdstep of the motion path stream correspond to the snapshot neuronsin the other pathways. Optical flow pattern neurons were identi-fied at different locations of the visual cortex (i.e., STS or fusiformand occipital face areas).

5.3. Interaction between pathways

The mechanism for recognizing biological movements involvestwo independent pathways. The information of these two path-ways interact at several levels of the mammalian brain [93,94].The proposed approach presented an interaction between path-ways by combining the results of IncSFA with the flow information(similar to [6]) and follows supporting evidence to generate thefinal recognition results. Such interaction simplifies the aggrega-tion of the model (at the STS level for instance [98]) and improvesits performance. Holonomical features consider both pathwaysfor predefined action templates. In the form pathway, the pro-posed approach follows the holonomic theory of brain ([87,90]).The dendritic receptive fields in sensory cortexes are mathemati-cally described by using Gabor functions [73] and are largely usedby ABM [13]. As previously discussed, the primary stage involvesthe use of local (in V1 cells) and model detectors (Gabor-like fil-ters) in 16 (including 8 preferred) orientations and a scale thatdepends on the receptive field [38,39]. ABM also acts as a snapshotdetector in the human body shape model that is consistent withthe IT area (inferotemporal cortex) of monkeys where view-tunedneurons are located. The model for complex-shaped tunes [99] isimplemented through ABM-based IncSFA. Specifically, SFA repre-

sents the performance of view-tuned neurons in the IT area andthat of snapshot neurons to achieve an in-dependency in scale andposition. The proposed model follows the modeling and is adjustedthrough training with key frames. The results of the optical flow

B. Yousefi, C.K. Loo / Applied Soft Computing 62 (2017) 57–72 69

Table 3Neurophysiological connection to our proposed approach.

Relevant neurophysiological evidents

Connection Neurophysiological point Implemented by Approach

Dorsal stream Motion and direction selection in MT and V4. Optical flow Dow et al. (1981) [39], Eifuku&Wurtz (1998) [40]

Ventral stream Gabor-liked filter ventral stream and dendriticreceptive fields in sensory cortices by Gabor functions.

ABM Pribram (1991) [87], Perus andLoo (2011) [90]

Ventral stream &V1 Local maximum pooling of simple cell responses isperformed in V1 complex cells.

ABM-IncSFA Riesenhuber and Poggiostheory (1999) [37]

Ventral stream The local perturbations of orientations and locations oflinear basis elements in the Olshausen and Field modelcan be derived in the form of a deformable templatefrom the active and linear basis.

Action prototypesIncSFA

Yuille et al. (1992) [81]

V1 Motion and orientation as local features. Optical flow Hubel and Wiesel (1968) [88]Dorsal stream The first level of the motion pathway has local motion

detectors and includes direction-selective neurons.Optical flow Smith and Snowden (1994)

[91]Dorsal stream Motion-selective neurons of the components in MT. Optical flow Giese and Poggio (2003) [1]Dorsal stream The receptive field’s size is within the range of the

direction selection neurons in V1 and (foveal neurons)MT.

Layer-wised opticalflow

Hubel and Wiesel (1968) [88],Giese and Poggio (2003) [1]

Dorsal stream The second level has larger receptive fields for the localflow which induces the movement of stimuli(neuronsof motion pattern in area MT like bandpass or lowtuning by considering their speed.

Optical flow (pyramid) Rodman and Albright (1989)[92]

Dorsal stream Four direction neuron for local optical flow detectorsare considered for the motion edges.

Optical flow Giese and Poggio (2003) [1]

Dorsal stream The areas MT, MSTD, and MSTI correspond to themotion selective neurons

Optical flow Giese and Poggio (2003) [1],Dow et al. (1981) [39] andEifuku &Wurtz (1998) [40]

Interaction The mechanism involves two independent pathwayswhich interacts at several levels of the mammalianbrain.

ABM-IncSFA &opticalflow

Kourtzi and Kanwisher (2000)[93], Saleem et al. (2000) [94]

Interaction Straight cross-connection or feed-forward connection. ABM-IncSFA &Opticalflow

Cloutman (2013) [95]

Interaction Interactions between top-down and bottom-upprocessing.

ABM-IncSFA &Opticalflow

Laycock et al. (2007), [96],Cloutman (2013) [95], Krugeret al. (2013) [97]

Interaction The interaction between pathways can be simplyperformed by aggregating the information in themodel (e.g., at the STS level) to improve recognitionperformance.

Concatenation ofABM-IncSFA &Opticflow in ELM

Giese and Vaina (2001) [98],Kruger et al. (2013) [97]

Interaction The form signal effects on motion processing whichhelps the snapshot detectors in finding the humanbody shape.

Main effect ofABM-IncSFA in ELM

Mather et al. (2013) [33]

Interaction Interactions in the inferotemporal cortex of monkeyse

Optical flow division Logothetis et al. (1995) [99]

aimfndGssubawp

5

cawt

where the view-tuned neurons are located and thmodel of complex shape tuning.

re integrated with the form information that covers a high level ofnteraction between the snapshot neurons (static patterns) and the

otion pattern neurons (dynamic patterns). The ABM and IncSFAor form information summarization are based on neurobiological,euro-computational, and theoretical evidence [52,1,2]. The localirection is organized at the initial level of the form pathway, andabor-like modeling detector methods (i.e., ABM) achieve fair con-

tancy by modeling the cells as described in the aforementionedection [43]. We can generate 16 directions and 2 spatial scales bysing the mechanism of the proposed neurophysiologically plausi-le model. We also use two differentiators to gather informationbout the local direction of the pathway and complex-like cellsith independent form features that are appropriate for the form

athway.

.4. Comparison with related works

The proposed approach shows a considerably higher accuracy

ompared with extant approaches, most of which show similarccuracy ranges. Table 1 compares the accuracy of our approachith that of some state-of-the-art methods that are applied to

he KTH human action database. The accuracy of the proposed

approach is relatively comparable with that of the fairly accuratemethods of Schindler and Van Kool [2], Wang and Mori [80], Dana-far et al. [4], and Zhang and Tao [53]. The methods of Danafar [4]and Zhang [53] show great similarity with the proposed method interms of biological motivation. Table 2 compares the accuracy ofthe proposed approach with that of some state-of-the-art methodsthat are applied to the Weizmann human action database. Schindlerand Van Kool [2] and Wang and Mori [80] proposed extremely accu-rate approaches. The former shows a considerable accuracy for bothhuman action databases but fails to consider the temporal featuresin the configuration. The method of Zhang and Tao [53] is the firstSFA to be used for recognizing human actions yet is limited by itspoor biological configuration according to biological evidence.

In terms of mechanism configuration, the proposed develop-ments in the model are based on the models of Giese and Poggio [1],Schindler and Van Kool [2], and Danafar et al. [4], all of which haveGabor filters and optical flow in their ventral and dorsal streams.Our approach greatly improves the model by using an object-basedventral stream via ABM and by achieving an interaction between

optical flow and object recognition via the optical flow divisiontechnique. However, the model of Danafar et al. [2] shows anincreased computational time for each frame. Moreover, the high-

7 Soft C

doabiiop

5

cabiaviTatcmco

6

malftgTpiadtItomfffaca

A

eViatWmMi

0 B. Yousefi, C.K. Loo / Applied

imensional reproducing kernel Hilbert space data transformationf this model shows a favorable degree of interaction. The proposedpproach strives to achieve this data interaction advantage by com-ining the motion and form information for ELM classification. The

nteraction between the slow and fast features, which is not presentn the approaches of Zhang and Tao [53](which used slow features)r Wang and Mori [80] (which used optical flow), provides a newerspective for developing the computational model.

.5. Our contribution

The major contribution of this work approach lies in its appli-ation of computer vision as a biologically inspired approach ands a principal constituent of soft computing (similar to [105,106])y integrating IncSFA into the mechanism for recognizing biolog-

cal movements. Gabor wavelet-like filters have been previouslypplied in the ventral pathway [1–3] by integrating ABM (a super-ised Gabor-based method for object recognition [75]) [5,42,107])nto the ventral stream for recognizing the shape of movements.he proposed approach modifies the recognition mechanism bypplying IncSFA, which eliminates the non-robust behavior of ABMo find form information in the ventral stream, and strives to over-ome the limitation in the approach of Zhang and Tao [53] viaodified configuration. Moreover, the slow and fast features con-

ept definition for the recognition mechanism increases the noveltyf this approach.

. Conclusion

This paper develops a mechanism for recognizing biologicalovements in the visual system. The form and motion pathways

re represented in the ventral and dorsal processing streams. Fol-owing the slowness principle, we use ABM-based IncSFA to extractorm information from the human movement shape and use IncSFAo extract the slowest features in this pathway. The optical flowenerates the motion information and extracts the fast features.he form information is abstracted by using IncSFA and is com-ared with the action prototypes afterward. However, the motion

nformation involves an optical flow division that achieves an inter-ction between two pathways by aggregating information. For theecision-making process, a kernel ELM recognizes the actions usinghe information of both pathways as its attributes. IntegratingncSFA into the ventral stream and involving slow and fast fea-ures in the recognition mechanism are the major contributionsf this research. The model analyzes and modifies the previousechanism for recognizing biological movements by combining

ast and slow features. The integration of these features shows aavorable performance in recognizing human actions, and such per-ormance is evaluated by using the KTH and Weizmann humanction datasets for benchmarking. Future studies on recognitionapability may modify dual pathways by using dictionary learningpproaches.

cknowledgments

The authors would like to thank anonymous reviewers andditorial board of Applied Soft Computing journal (specially Basan Vlijmen) for their constructive comments which helped us to

mprove the quality of the paper also Ce Liu in Computer Sciencend Artificial Intelligence Laboratory (CSAIL), Massachusetts Insti-ute of Technology for layer-wised optical flow codes, Ying Nian

u in UCLA (University of California, Los Angeles) for active basisodel code, Matthew Luciw in the Swiss AI Lab IDSIA (Istituto Dalleolle di Studi sull’Intelligenza Artificiale) for MATLAB codes of

ncremental slow feature analysis, and Guang-Bin Huang and Zhou

omputing 62 (2017) 57–72

Hongming in Nanyang Technological University for their guidancein using ELM. We acknowledge Paria Yousefi from school of com-puter science, Bangor University for her help and comments. Thisresearch was done under the High Impact Research (HIR) founda-tion grant: contract No. UM.C/HIR/MOHE/FCSIT/10, in UniversityMalaya (UM), Malaysia.

References

[1] M.A. Giese, T. Poggio, Neural mechanisms for the recognition of biologicalmovements, Nat. Rev. Neurosci. 4 (3) (2003) 179.

[2] K. Schindler, L. Van Gool, Action snippets: How many frames does humanaction recognition require? in: IEEE Conference on Computer Vision andPattern Recognition, 2008, CVPR 2008, IEEE, 2008, pp. 1–8.

[3] S. Danafar, A. Giusti, J. Schmidhuber, Novel kernel-based recognizers ofhuman actions EURASIP J. Adv. Signal Process. 2010 (1) (2010) 202768.

[4] S. Danafar, A. Gretton, J. Schmidhuber, Characteristic kernels on structureddomains excel in robotics and human action recognition, in: Joint EuropeanConference on Machine Learning and Knowledge Discovery in Databases,Springer, 2010, pp. 264–279.

[5] B. Yousefi, C.K. Loo, Slow feature action prototypes effect assessment inmechanism for recognition of biological movement ventral stream, Int. J.Bio-Inspired Comput. 8 (6) (2016) 410–424.

[6] B. Yousefi, C.K. Loo, Development of biological movement recognition byinteraction between active basis model and fuzzy optical flow division, Sci.World J. (2014).

[7] E.H. Adelson, J.R. Bergen, Spatiotemporal energy models for the perceptionof motion, JOSA A 2 (2) (1985) 284–299.

[8] S. Shioiri, P. Cavanagh, ISI produces reverse apparent motion, Vision Res. 30(5) (1990) 757–768.

[9] S. Shioiri, K. Matsumiya, Motion mechanisms with different spatiotemporalcharacteristics identified by an MAE technique with superimposed gratings,J. Vision 9 (5) (2009) 30.

[10] D. Regan, Orientation discrimination for objects defined by relative motionand objects defined by luminance contrast, Vision Res. 29 (10) (1989)1389–1400.

[11] Y.M. Marghi, F. Towhidkhah, S. Gharibzadeh, A two level real-time pathplanning method inspired by cognitive map and predictive optimization inhuman brain, Appl. Soft Comput. 21 (2014) 352–364.

[12] H. Sprekeler, C. Michaelis, L. Wiskott, Slowness: an objective forspike-timing-dependent plasticity? PLoS Comput. Biol. 3 (6) (2007) e112.

[13] Y.N. Wu, Z. Si, H. Gong, S.-C. Zhu, Learning active basis model for objectdetection and recognition, Int. J. Comput. Vision 90 (2) (2010) 198–235.

[14] V.R. Kompella, M. Luciw, J. Schmidhuber, Incremental slow feature analysis:adaptive low-complexity slow feature updating from high-dimensionalinput streams, Neural Comput. 24 (11) (2012) 2994–3024.

[15] E. Vats, C.S. Chan, Early detection of human actions – a hybrid approach,Appl. Soft Comput. 46 (2016) 953–966.

[16] E.P. Ijjina, C.K. Mohan, Hybrid deep neural network model for human actionrecognition, Appl. Soft Comput. 46 (2016) 936–952.

[17] Y. Zhang, Z. Zhang, Y. Deng, S. Mahadevan, A biologically inspired solutionfor fuzzy shortest path problems, Appl. Soft Comput. 13 (5) (2013)2356–2363.

[18] R.V. Babu, B. Rangarajan, S. Sundaram, M. Tom, Human action recognition inH.264/AVC compressed domain using meta-cognitive radial basis functionnetwork, Appl. Soft Comput. 36 (2015) 218–227.

[19] A. Johnston, D.H. Arnold, S. Nishida, Spatially localized distortions of eventtime, Curr. Biol. 16 (5) (2006) 472–479.

[20] K. Moutoussis, S. Zeki, A direct demonstration of perceptual asynchrony invision, Proc. R. Soc. Lond. B: Biol. Sci. 264 (1380) (1997) 393–399.

[21] D. Whitney, I. Murakami, Latency difference, not spatial extrapolation, Nat.Neurosci. 1 (8) (1998) 656–657.

[22] A.O. Holcombe, Seeing slow and seeing fast: two limits on perception,Trends Cogn. Sci. 13 (5) (2009) 216–221.

[23] H. Sprekeler, C. Michaelis, L. Wiskott, Slowness: an objective forspike-timing-dependent plasticity? PLoS Comput. Biol. 3 (6) (2007) e112.

[24] C. Liu, Optical Flow MATLAB/C++ Code, Cited in Sec 6 (6.4), 2012, pp. 1.[25] S. Becker, G.E. Hinton, Self-organizing neural network that discovers

surfaces in random-dot stereograms, Nature 355 (6356) (1992) 161.[26] P. Berkes, L. Wiskott, Slow feature analysis yields a rich repertoire of

complex cell properties, J. Vision 5 (6) (2005) 9.[27] K.P. Körding, C. Kayser, W. Einhäuser, P. König, How are complex cell

properties adapted to the statistics of natural stimuli? J. Neurophysiol. 91(1) (2004) 206–212.

[28] B. Yousefi, C.K. Loo, Bio-Inspired Human Action Recognition Using HybridMax-Product Neuro-Fuzzy Classifier and Quantum-Behaved PSO, 2016arXiv:1509.03789.

[29] B. Yousefi, C.K. Loo, A. Memariani, Biological inspired human actionrecognition, in: 2013 IEEE Workshop on Robotic Intelligence inInformationally Structured Space (RiiSS), IEEE, 2013, pp. 58–65.

[30] T. Guthier, V. Willert, J. Eggert, Topological sparse learning of dynamic formpatterns, Neural Comput. (2015).

Soft C

B. Yousefi, C.K. Loo / Applied

[31] M.A. Goodale, J.P. Meenan, H.H. Bülthoff, D.A. Nicolle, K.J. Murphy, C.I.Racicot, Separate neural pathways for the visual analysis of object shape inperception and prehension, Curr. Biol. 4 (7) (1994) 604–610.

[32] V.R. Kompella, M.D. Luciw, J. Schmidhuber, Incremental Slow FeatureAnalysis, 2012.

[33] G. Mather, A. Pavan, R.B. Marotti, G. Campana, C. Casco, Interactionsbetween motion and form processing in the human visual system, Front.Comput. Neurosci. 7 (2013).

[34] S. Thorpe, D. Fize, C. Marlot, Speed of processing in the human visual system,Nature 381 (6582) (1996) 520.

[35] M. Oram, D. Perrett, Integration of form and motion in the anterior superiortemporal polysensory area (STPA) of the macaque monkey, J. Neurophysiol.76 (1) (1996) 109–129.

[36] G. Johansson, Spatio-temporal differentiation and integration in visualmotion perception, Psychol. Res. 38 (4) (1976) 379–393.

[37] M. Riesenhuber, T. Poggio, Hierarchical models of object recognition incortex, Nat. Neurosci. 2 (11) (1999).

[38] M. Riesenhuber, T. Poggio, Neural mechanisms of object recognition, Curr.Opin. Neurobiol. 12 (2) (2002) 162–168.

[39] B. Dow, A. Snyder, R. Vautin, R. Bauer, Magnification factor and receptivefield size in foveal striate cortex of the monkey, Exp. Brain Res. 44 (2) (1981)213–228.

[40] S. Eifuku, R.H. Wurtz, Response to motion in extrastriate area MSTL:center-surround interactions, J. Neurophysiol. 80 (1) (1998) 282–296.

[41] K.N. Kay, J. Winawer, A. Rokem, A. Mezer, B.A. Wandell, A two-stage cascademodel of bold responses in human visual cortex, PLoS Comput. Biol. 9 (5)(2013) e1003079.

[42] A. Memariani, C.K. Loo, Biologically inspired dictionary learning for visualpattern recognition, Informatica 37 (4) (2013).

[43] J.P. Jones, L.A. Palmer, An evaluation of the two-dimensional gabor filtermodel of simple receptive fields in cat striate cortex, J. Neurophysiol. 58 (6)(1987) 1233–1258.

[44] B.A. Olshausen, D.J. Field, Emergence of simple-cell receptive field propertiesby learning a sparse code for natural images, Nature 381 (6583) (1996) 607.

[45] A. Levey, M. Lindenbaum, Sequential Karhunen–Loeve basis extraction andits application to images, IEEE Trans. Image Process. 9 (8) (2000) 1371–1374.

[46] G. Johansson, Visual perception of biological motion and a model for itsanalysis, Percept. Psychophys. 14 (2) (1973) 201–211.

[47] C. Liu, Gabor-based kernel PCA with fractional power polynomial models forface recognition, IEEE Trans. Pattern Anal. Mach. Intell. 26 (5) (2004)572–581.

[48] M. Lades, J.C. Vorbruggen, J. Buhmann, J. Lange, C. Von Der Malsburg, R.P.Wurtz, W. Konen, Distortion invariant object recognition in the dynamiclink architecture, IEEE Trans. Comput. 42 (3) (1993) 300–311.

[49] J.P. Jones, L.A. Palmer, An evaluation of the two-dimensional gabor filtermodel of simple receptive fields in cat striate cortex, J. Neurophysiol. 58 (6)(1987) 1233–1258.

[50] J.G. Daugman, Two-dimensional spectral analysis of cortical receptive fieldprofiles, Vision Res. 20 (10) (1980) 847–856.

[51] B. Moghaddam, Principal manifolds and probabilistic subspaces for visualrecognition, IEEE Trans. Pattern Anal. Mach. Intell. 24 (6) (2002) 780–788.

[52] S. Liwicki, S. Zafeiriou, M. Pantic, Incremental slow feature analysis withindefinite kernel for online temporal video segmentation, in: AsianConference on Computer Vision, Springer, 2012, pp. 162–176.

[53] Z. Zhang, D. Tao, Slow feature analysis for human action recognition, IEEETrans. Pattern Anal. Mach. Intell. 34 (3) (2012) 436–450.

[54] M. Franzius, H. Sprekeler, L. Wiskott, Slowness and sparseness lead to place,head-direction, and spatial-view cells, PLoS Comput. Biol. 3 (8) (2007) e166.

[55] L. Wiskott, T.J. Sejnowski, Slow feature analysis: unsupervised learning ofinvariances, Neural Comput. 14 (4) (2002) 715–770.

[56] T. Zito, N. Wilbert, L. Wiskott, P. Berkes, Modular toolkit for data processing(MDP): a python data processing framework, Front. Neuroinform. 2 (2009).

[57] I.T. Jolliffe, Principal component analysis and factor analysis, in: PrincipalComponent Analysis, Springer, 1986, pp. 115–128.

[58] W. Hashimoto, Quadratic forms in natural images, Netw. Comput. NeuralSyst. 14 (4) (2003) 765–788.

[59] H. Sprekeler, C. Michaelis, L. Wiskott, Slowness: an objective forspike-timing-dependent plasticity? PLoS Comput. Biol. 3 (6) (2007) e112.

[60] A. Bray, D. Martinez, Kernel-based extraction of slow features: complex cellslearn disparity and translation invariance from natural images, Advances inNeural Information Processing Systems (2003) 269–276.

[61] J.V. Stone, Learning perceptually salient visual parameters usingspatiotemporal smoothness constraints, Neural Comput. 8 (7) (1996)1463–1492.

[62] D.A. Ross, J. Lim, R.-S. Lin, M.-H. Yang, Incremental learning for robust visualtracking, Int. J. Comput. Vision 77 (1) (2008) 125–141.

[63] B. Schölkopf, A. Smola, K.-R. Müller, Nonlinear component analysis as akernel eigenvalue problem, Neural Comput. 10 (5) (1998) 1299–1319.

[64] H. Nickisch, Extraction of Visual Features from Natural Video Data UsingSlow Feature Analysis, 2006.

[65] C. Liu, et al., Beyond Pixels: Exploring New Representations and Applications

for Motion Analysis (Ph.D. thesis), Massachusetts Institute of Technology,2009.

[66] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: a newlearning scheme of feedforward neural networks, in: 2004 IEEE

omputing 62 (2017) 57–72 71

International Joint Conference on Neural Networks, 2004, Proceedings, vol.2, IEEE, 2004, pp. 985–990.

[67] G.-B. Huang, D.H. Wang, Y. Lan, Extreme learning machines: a survey, Int. J.Mach. Learn. Cybern. 2 (2) (2011) 107–122.

[68] G.-B. Huang, Q.-Y. Zhu, C.-K. Siew, Extreme learning machine: theory andapplications, Neurocomputing 70 (1) (2006) 489–501.

[69] R. Rajesh, J.S. Prakash, Extreme learning machines-a review andstate-of-the-art, Int. J. Wisdom Based Comput. 1 (1) (2011) 35–49.

[70] N.-Y. Liang, G.-B. Huang, P. Saratchandran, N. Sundararajan, A fast andaccurate online sequential learning algorithm for feedforward networks,IEEE Trans. Neural Netw. 17 (6) (2006) 1411–1423.

[71] G.-B. Huang, L. Chen, C.K. Siew, et al., Universal approximation usingincremental constructive feedforward networks with random hidden nodes,IEEE Trans. Neural Netw. 17 (4) (2006) 879–892.

[72] K. Banerjee, Generalized Inverse of Matrices and Its Applications, 1973.[73] C. Schuldt, I. Laptev, B. Caputo, Recognizing human actions: a local SVM

approach, in: Proceedings of the 17th International Conference on PatternRecognition, 2004, ICPR 2004, vol. 3, IEEE, 2004, pp. 32–36.

[74] L. Gorelick, M. Blank, E. Shechtman, M. Irani, R. Basri, Actions as space-timeshapes, IEEE Trans. Pattern Anal. Mach. Intell. 29 (12) (2007) 2247–2253.

[75] Y. Wu, Z. Si, H. Gong, S. Zhu, Active Basis Model, Shared Sketch Algorithm,and Sum-Max Maps for Representing, Learning and Recognizing DeformableTemplates, Tech. Rep., 2009.

[76] V.R. Kompella, M. Luciw, J. Schmidhuber, Incremental slow feature analysis:adaptive low-complexity slow feature updating from high-dimensionalinput streams, Neural Comput. 24 (11) (2012) 2994–3024.

[77] http://www.ntu.edu.sg/home/egbhuang/elm codes.html (2012).[78] K. Tanaka, Inferotemporal cortex and object vision, Annu. Rev. Neurosci. 19

(1) (1996) 109–139.[79] J.C. Niebles, H. Wang, L. Fei-Fei, Unsupervised learning of human action

categories using spatial-temporal words, Int. J. Comput. Vision 79 (3) (2008)299–318.

[80] Y. Wang, G. Mori, Human action recognition by semilatent topic models,IEEE Trans. Pattern Anal. Mach. Intell. 31 (10) (2009) 1762–1774.

[81] A.L. Yuille, P.W. Hallinan, D.S. Cohen, Feature extraction from faces usingdeformable templates, Int. J. Comput. Vision 8 (2) (1992) 99–111.

[82] S.-C. Zhu, D. Mumford, et al., A stochastic grammar of images, Found. TrendsComput. Graph. Vision 2 (4) (2007) 259–362.

[83] J.G. Daugman, Uncertainty relation for resolution in space, spatial frequency,and orientation optimized by two-dimensional visual cortical filters, JOSA A2 (7) (1985) 1160–1169.

[84] J.G. Daugman, Complete discrete 2-d Gabor transforms by neural networksfor image analysis and compression, IEEE Trans. Acoust. Speech SignalProcess. 36 (7) (1988) 1169–1179.

[85] S. Marcelja, Mathematical description of the responses of simple corticalcells, JOSA 70 (11) (1980) 1297–1300.

[86] U. Hasson, E. Yang, I. Vallines, D.J. Heeger, N. Rubin, A hierarchy of temporalreceptive windows in human cortex, J. Neurosci. 28 (10) (2008) 2539–2550.

[87] K.H. Pribram, Brain and Perception: Holonomy and Structure in FiguralProcessing, Psychology Press, 1991.

[88] D.H. Hubel, T.N. Wiesel, Receptive fields and functional architecture ofmonkey striate cortex, J. Physiol. 195 (1) (1968) 215–243.

[89] R.U. Muller, E. Bostock, J.S. Taube, J.L. Kubie, On the directional firingproperties of hippocampal place cells, J. Neurosci. 14 (12) (1994) 7235–7251.

[90] M. Perus, C.K. Loo, Holonomic brain processes, in: Biological and QuantumComputing for Human Vision: Holonomic Models and Applications, IGIGlobal, 2011, pp. 19–45.

[91] A.T. Smith, R.J. Snowden, Visual Detection of Motion, Academic Press, 1994.[92] H. Rodman, T. Albright, Single-unit analysis of pattern-motion selective

properties in the middle temporal visual area (MT), Exp. Brain Res. 75 (1)(1989) 53–64.

[93] Z. Kourtzi, N. Kanwisher, Activation in human MT/MST by static images withimplied motion, J. Cogn. Neurosci. 12 (1) (2000) 48–55.

[94] K. Saleem, W. Suzuki, K. Tanaka, T. Hashikawa, Connections betweenanterior inferotemporal cortex and superior temporal sulcus regions in themacaque monkey, J. Neurosci. 20 (13) (2000) 5083–5101.

[95] L.L. Cloutman, Interaction between dorsal and ventral processing streams:where, when and how? Brain Lang. 127 (2) (2013) 251–263.

[96] R. Laycock, S. Crewther, D.P. Crewther, A role for the ‘magnocellularadvantage’ in visual impairments in neurodevelopmental and psychiatricdisorders, Neurosci. Biobehav. Rev. 31 (3) (2007) 363–376.

[97] N. Kruger, P. Janssen, S. Kalkan, M. Lappe, A. Leonardis, J. Piater, A.J.Rodriguez-Sanchez, L. Wiskott, Deep hierarchies in the primate visualcortex: what can we learn for computer vision? IEEE Trans. Pattern Anal.Mach. Intell. 35 (8) (2013) 1847–1871.

[98] M.A. Giese, L.M. Vaina, Pathways in the analysis of biological motion:computational model and FMRI results, Percept. ECVP Abstr. 30 (2001).

[99] N.K. Logothetis, J. Pauls, T. Poggio, Shape representation in the inferiortemporal cortex of monkeys, Curr. Biol. 5 (5) (1995) 552–563.

[100] S. Grossberg, E. Mingolla, L. Viswanathan, Neural dynamics of motionintegration and segmentation within and across apertures, Vision Res. 41

(19) (2001) 2521–2553.

[101] M.E. Sereno, Neural Computation of Pattern Motion: Modeling Stages ofMotion Analysis in the Primate Visual Cortex, MIT Press, 1993.

[102] S.J. Nowlan, T.J. Sejnowski, A selection model for motion processing in areaMT of primates, J. Neurosci. 15 (2) (1995) 1195–1214.

7 Soft C

approach based on mutual adaptation for human gesture recognition, Appl.

2 B. Yousefi, C.K. Loo / Applied

[103] E.P. Simoncelli, D.J. Heeger, A model of neuronal responses in visual area MT,Vision Res. 38 (5) (1998) 743–761.

[104] M.A. Giese, T. Poggio, Neural mechanisms for the recognition of biologicalmovements, Nat. Rev. Neurosci. 4 (3) (2003) 179.

[105] W.S. Liew, M. Seera, C.K. Loo, E. Lim, Affect classification usinggenetic-optimized ensembles of fuzzy artmaps, Appl. Soft Comput. 27(2015) 53–63.

omputing 62 (2017) 57–72

[106] T. Obo, C.K. Loo, M. Seera, N. Kubota, Hybrid evolutionary neuro-fuzzy

Soft Comput. 42 (2016) 377–389.[107] C.-K. Loo, A. Memariani, Object recognition using sparse representation of

overcomplete dictionary, in: Neural Information Processing, Springer, 2012,pp. 75–82.