Driving posture recognition by joint application of motion history image and pyramid histogram of...

Preview:

Citation preview

Research ArticleDriving Posture Recognition by Joint Application of MotionHistory Image and Pyramid Histogram of Oriented Gradients

Chao Yan1 Frans Coenen2 and Bailing Zhang1

1 Department of Computer Science and Software Engineering Xirsquoan Jiaotong-Liverpool University SIP Suzhou 215123 China2Department of Computer Science University of Liverpool Liverpool L69 3BX UK

Correspondence should be addressed to Bailing Zhang bailingzhangxjtlueducn

Received 3 August 2013 Accepted 30 October 2013 Published 28 January 2014

Academic Editor Aboelmagd Noureldin

Copyright copy 2014 Chao Yan et alThis is an open access article distributed under theCreativeCommonsAttribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

In the field of intelligent transportation system (ITS) automatic interpretation of a driverrsquos behavior is an urgent and challengingtopic This paper studies vision-based driving posture recognition in the human action recognition framework A driving actiondataset was prepared by a side-mounted camera looking at a driverrsquos left profile The driving actions including operating the shiftlever talking on a cell phone eating and smoking are first decomposed into a number of predefined action primitives that isinteraction with shift lever operating the shift lever interaction with head and interaction with dashboard A global grid-basedrepresentation for the action primitives was emphasized which first generate the silhouette shape from motion history imagefollowed by application of the pyramid histogram of oriented gradients (PHOG) for more discriminating characterization Therandom forest (RF) classifierwas then exploited to classify the action primitives togetherwith comparisons to someother commonlyapplied classifiers such as 119896NN multiple layer perceptron and support vector machine Classification accuracy is over 94 for theRF classifier in holdout and cross-validation experiments on the four manually decomposed driving actions

1 Introduction

In China the number of personal-use automobiles has con-tinued to grow at a rapid rate reaching the number120890000 in 2012 According to the World Health Orga-nization (WHO) there is an estimated number of 250000deaths due to road accidents every year making it the leadingcause of death for people aged 14 to 44 Unsafe and dangerousdriving accounts for the death of more than one millionlives and over 50 million serious injuries worldwide eachyear [1] The WHO also estimates that traffic accidents costthe Chinese economy over $21 billion each year One of keycontributing factors is reckless driving [1] It is a proven factthat drivers who are reaching for an object such as a cell-phone are three times more likely to be involved in a motorvehicle accident while actually using a cell-phone increasesthe risks to six times as likely

In order to reduce unsafe driving behaviors one of theproposed solutions is to develop a camera-based system tomonitor the activities of drivers This is particularly relevant

for long-distance truck and bus drivers For example inmany countries including China it is illegal for drivers to beusing their cell-phone whilst driving Drivers who violate therestriction face civil penalties However how to automaticallydistinguish between safe and unsafe driving actions is not atrivial technical issue Since most commercial drivers operatealone most of their driving behaviors are not directly observ-able by others Such barriers will disappear when in-vehicletechnologies become available to observe driver behaviorsAn emerging technology that has attracted wide attentionis the development of driver alertness monitoring systemswhich aims at measuring driver status and performance toprovide in-vehicle warnings and feedback to drivers Truckand bus fleet managers are particularly interested in suchsystems to acquire sound safety management They canregularly track their driver outcomes and provide preventionof crashes incidents and violations

Vision-based driving activitymonitoring is closely relatedto human action recognition (HAR) which is an importantarea of computer vision research and applications The goal

Hindawi Publishing CorporationInternational Journal of Vehicular TechnologyVolume 2014 Article ID 719413 11 pageshttpdxdoiorg1011552014719413

2 International Journal of Vehicular Technology

of the action recognition is to classify image sequences toa human action based on the temporality of video imagesMuch progress has been made on how to distinguish actionsin daily life using cameras and machine learning algorithmsHAR has no unique definition it changes depending on thedifferent levels of abstraction Moeslund et al [2] proposeddifferent taxonomies that is action primitive action andactivity An action primitive is a very basic movement thatcan be described at the decomposed level An action is com-posed of action primitives that describes a cyclic or whole-body movement Activities consist of a sequence of actionsparticipated by one ormore participates In the recognition ofdriversrsquo action the context is usually not taken into accountfor example the background environment variation outsidethe window and interactions with another person or movingobject Accordingly this paper only focuses on partial bodymovements of the driver

There exists some works on driver activity monitoringTo monitor a driverrsquos behavior some of the works focusedon the detection of driver alertness through monitoring theeyes face head or facial expressions [3ndash7] In one study thedriverrsquos facewas tracked and yaworientation angleswere usedto estimate the driverrsquos face pose [8]The Fisherface approachwas applied by Watta et al to represent and recognise thedriverrsquos seven poses including looking over the left shoulderin the left rear-view mirror at the road ahead down atthe instrument panel at the centre rear-view mirror at theright rear-view mirror or over the right shoulder [9] Inorder to minimize the influence of various illumination andbackground Kato et al used a far infrared camera to detectthe driver face direction such as leftward frontward andrightward [10] Cheng et al presented a combination ofthermal infrared and color images with multiple camerasto track important driver body parts and to analyze driveractivities such as steering the car forward turning left andturning right [11] Veeraraghavan et al used the driverrsquosskin-region information to group two actions grasping thesteering wheel and talking on a cell phone [12 13] Zhao et alextended and improved Veeraraghavanrsquos work to recognisefour driving postures that is grasping the steering wheeloperating the shift lever eating and talking on a cell phone[14 15] Tran et al studied driverrsquos behaviors by foot gestureanalysis [16] Other works focused on capturing the driverrsquosattention by combining different vision-based features andphysical status of the vehicle [17ndash22]

The task of driver activity monitoring can be generallystudied in the human action recognition framework theemphasis of which is often on finding good feature repre-sentations that should be able to tolerate variations in view-point human subject background illumination and so onThere are two main categories of feature descriptions globaldescriptions and local descriptions The former considerthe visual observation as a whole while the latter describethe observation as a collection of independent patches orlocal descriptors Generally global representation is derivedfrom silhouettes edges or optical flow One of the earli-est global representation approaches called motion historyimage (MHI) was proposed by Bobick and Davis [23]which extract silhouettes by using background subtraction

and aggregate difference between subsequence in an actionsequence Other global description methods include the Rtransform [24] contour-based approach [25 26] and opticalflow [27ndash30] The weakness of global representation includesthe sensitivity to noise partial occlusions and variations inviewpoint Instead of global representation local represen-tation describes the observation as a collection of space-time interesting points [31] or local descriptors which usu-ally does not require accurate localisation and backgroundsubtraction Local representation has the advantage of beinginvariant to different of viewpoint appearance of person andpartial occlusions The representative local descriptor is thespace-time interest point detectors proposed by Laptev andLindeberg [31] which however has the shortcoming of onlyhaving a small number of stable interest points available inpractice Some of their derivations have been proposed forexample extracted space-time cuboids [32]

In this paper we studied driversrsquo activity recognitionby comprehensively considering action detection represen-tation and classification Our contributions include threeparts The first part is our deviation from many publishedworks on driversrsquo posture based on static images fromdriversrsquo action sequence which has the potential problem ofconfusion caused by similar postures It is very possible thattwo frames of vision-similar posture are extracted from twocompletely different action image sequences For examplethe momentframe that a driver moves the cell phone acrosshis or her mouth can be confused as eating Following theaction definition in [2] which is based on the combinationof basic movements we regard driving activity as space-time action instead of static space-limited posture The maindriving activity we considered are hand-conducted actionssuch as eating and using a cell phone

The second contribution of this paper is our proposalof the driving action decomposition Generally the drivingactions that take place in the drivers seat are mainly per-formed by hand which include but are not limited to eatingsmoking talking on the cell phone and operating the shiftlever These actions or activities are usually performed byshifting the hand position which is confined to the driversseat Following the train of thought in [2] we regard theactions or activities as a combination of a number of basicmovements or action primitives We created a driving actiondataset similar to the SEU dataset [14] with four differenttypes of action sequences including operating the shift leverresponding to a cell phone call eating and smoking Theactions are then decomposed into four action primitives thatis hand interaction with shift lever hand operating the shiftlever hand interaction with head and hand interaction withdashboard Upon the classification of these action primitivesthe driving actions involving eating smoking and otherabnormal behaviors can be accordingly recognised as acombination of action primitives [33]

The last contribution of this paper is the proposal ofa global grid-based representation for the driving actionswhich is a combination of the motion history image (MHI)[23] and pyramid histogram of oriented gradients (POHG)[34] and the application of random forest classifier (RF)for the driving actions recognition Encoding the region

International Journal of Vehicular Technology 3

of interest in the drivers seat is a natural choice as thereare few noises and no partial occlusions in the video Theaction silhouettes were first extracted to represent actionprimitives by applying MHI to aggregate the differencebetween subsequent frames To have better discriminationthan MHI alone the pyramid histogram of oriented gradientof the MHI was calculated as the features for further trainingand classification PHOG is a spatial pyramid extension ofthe histogram of gradients (HOG) [35] descriptors whichhas been used extensively in computer vision After thecomparison of several different classification algorithmsthe random forest (RF) classifier was chosen for the drivingaction recognition which offers satisfactory performance

The rest of the paper is organized as follows Section 2gives a brief introduction on our driving posture datasetcreation and the necessary preprocessing Section 3 andSection 4 review the motion history image and the pyramidhistogram of oriented gradients with explanation of howthey are applied in driving posture description respectivelySection 5 introduces the random forest classifier and otherthree commonly used classificationmethods for comparisonSection 6 reports the experiment results followed by conclu-sion in Section 7

2 Driving Action DatasetCreation and Preprocessing

A driving action dataset was prepared which contains 20video clips in 640times42424 fpsThe video was recorded usinga Nikon D90 camera at a car park in the Xirsquoan Jiaotong-Liverpool University Tenmale drivers and ten female driversparticipated in the experiment by pretending to drive in thecar and conducting several actions that simulated real drivingsituations Five predefined driving actions were imitated thatis turning the steering wheel operating the shift lever eatingsmoking and using a cell phone

There are five steps involved in simulating the drivingactivities by each participant

Step 1 A driver first grasps the steering wheel and slightlyturns the steering wheel

Step 2 The driverrsquos right hand moves to shift the lever andoperates it for several times before moving back to thesteering wheel

Step 3 The driver takes a cell phone from the dashboard andresponds to a phone call and then puts it back by his or herright hand

Step 4 Thedriver takes a cookie from the dashboard and eatsit using his or her right hand

Step 5 For male drivers he takes a cigarette from thedashboard and puts it into his mouth and then uses a lighterto light the cigarette and then puts it back on the dashboard

This experiment extracted twenty consecutive picturesequences from the video of the dataset for further experi-mentation

Frame number0 1000 2000 3000 4000 5000 6000

12000

10000

8000

2000

4000

6000

0

Are

a of d

iffer

ence

poi

nt co

mpa

red

to p

revi

ous f

ram

e

Figure 1 The vertical axis stands for area of difference point com-pared to previous frame by applying Otsursquos thresholding method[26] the horizontal axis stands for the frame number

21 Action Detection and Segmentation Being similar tomany intelligent video analysis systems action recognitionshould start with motion detection in a continuous videostream for which many approaches are available Among thepopular approaches frame differencing is the simplest andmost efficient method which involves taking the differencebetween two frames to detect the object Frame differencingis widely appliedwith proven performance particularly whena fixed camera is used to observe dynamic events in a scene

With each driving action sequence the frame differ-ences between two adjacent image frames are first calcu-lated followed by thresholding operation to identify movingobjects Otsursquos thresholding method [26] was chosen whichminimizes the intraclass variance of the black and whitepixels The existence of moving objects will be determinedby evaluating whether there exist connected regions in thebinary image And the location of the moving objects can befurther calculated based on the total areas and coordinate ofconnected regions The details can be illustrated by Figure 1

In the following section the segmented actions imagesare further manually labelled into four different categories ofaction primitives based on the trajectory of driverrsquos right handas shown in Figure 2 The first category of action is movingto shift lever with the right hand from the steering wheel ormoving back to the steering wheel from the shift lever Thesecond category of action is operating the shift lever withthe right hand The third category of action is moving to thedashboard from the steering wheel with the right hand ormoving back to the steering wheel from the dashboard withthe right handThe fourth category of action is moving to thehead from the steering wheel or moving back to the steeringwheel from the head with the right hand

3 Motion History Image (MHI)

Motion history image (MHI) approach is a view-basedtemporal template approach developed by Bobick and Davis[23] which is simple but robust in the representation ofmovements and is widely employed in action recognition

4 International Journal of Vehicular Technology

motion analysis and other related applications [36ndash38] Themotion history image (MHI) can describe how the motion ismoving in the image sequence Another representation calledmotion energy image (MEI) can demonstrate the presence ofany motion or a spatial pattern in the image sequence BothMHI and MEI templates comprise the motion history image(MHI) template-matching method

Figure 3 shows a movement in driving The first row issome key frames in a driving action The second and thirdrows are the frame differences and the corresponding binaryimages from applying Otsursquos thresholding The fourth andfifth rows are cumulative MEI and MHI images respectivelyMHIrsquos pixel intensity is a function of the recency of motion ina sequence where brighter values correspond to more recentmotion We currently use a simple replacement and lineardecay operator using the binary image difference framesTheformal definitions are briefly explained below

diff (119909 119910 119905) = zeros (119909 119910 1) if 119905 = 11003816100381610038161003816119868 (119909 119910 119905 minus 1) minus 119868 (119909 119910 119905)

1003816100381610038161003816 otherwise(1)

where diff (119909 119910 119905) is a difference image sequence indicatingthe difference compared to previous frame Let

119863(119909 119910 119905) = 0 if diff (119909 119910 119905) lt threshold1 otherwise

(2)

where 119863(119909 119910 119905) is binary images sequence indicating regionof motion Then the motion energy image is defined as

MEI (119909 119910 119905) =119905 = action end frame

119905 = action start frame119863(119909 119910 119905) (3)

Both motion history images and motion energy imageswere introduced to capture motion information in images[23] While MEI only indicates where the motion is motionhistory image MHI (119909 119910 119905) represents the way the objectmoving which can be defined as

MHI =

255 if 119863(119909 119910 119905) = 1

max 0MHI (119909 119910 119905 minus 1) minus 1 if 119863(119909 119910 119905) = 1255

picseqlengthle 1

max0MHI (119909 119910 119905 minus 1) minus floor ( 255

pic seq length) if 119863(119909 119910 119905) = 1

255

picseqlengthgt 1

(4)

The result is a scalar-valued image where latest movingpixels are the brightest MHI can represent the location theshape and the movement direction of an action in a picturesequence As MEI can be obtained by thresholding the MHIabove zero we will only consider features derived fromMHIin the following

After the driving actions were detected and segmentedfrom the raw video dataset motion history images wereextracted for each of the four decomposed action setsFigure 4 demonstrates how the motion history image iscalculated to represent movements for each decomposedaction sequence In the figure the left column and themiddlecolumn are the start and end frames of a decomposed actionsnippet respectively The right column is the MHI calculatedfor the corresponding action snippet

4 Pyramid Histogram of OrientedGradients (PHOG)

Motion history image MHI is not appropriate to be directlyexploited as features for the purpose of comparison orclassification in practical applications In the basic MHImethod [23] after calculating the MHI and MEI featurevectors are calculated employing the seven high-order HumomentsThen these feature vectors are used for recognitionHowever Hursquos moment invariants have some drawbacks

particularly limited recognition power [39] In this paper thehistogram of oriented gradients feature is extracted from theMHI as the suitable features for classification

In many image processing tasks the local geometricalshapes within an image can be characterized by the dis-tribution of edge directions called histograms of orientedgradients (HOG) [35] HOG can be calculated by evaluating adense grid of well-normalized local histograms of image gra-dient orientations over the image windows HOG has someimportant advantages over other local shape descriptors forexample it is invariant to small deformations and robust interms of outliers and noise

The HOG feature encodes the gradient orientation ofone image patch without considering where this orientationoriginates from in this patch Therefore it is not discrimi-native enough when the spatial property of the underlyingstructure of the image patch is important The objective ofa newly proposed improved descriptor pyramid histogramof oriented gradients (PHOG) [34] is to take the spatialproperty of the local shape into account while representingan image by HOG The spatial information is represented bytiling the image into regions at multiple resolutions based onspatial pyramid matching [40] Each image is divided intoa sequence of increasingly finer spatial grids by repeatedlydoubling the number of divisions in each axis directionThe number of points in each grid cell is then recorded

International Journal of Vehicular Technology 5

Hand interaction with shift lever

Frame 640 650 660

Frame 1150 1159 1170

(a)

Hand operating the shift lever

Frame 794 830815

Frame 701 705 709

(b)

Hand interaction with head

Frame 3318 3327 3340

Frame 2792 2801 2811

(c)

Hand interaction with dashboardFrame 4617 4631 4638

Frame 4109 4118 4134

(d)

Figure 2 Four manually decomposed action primitives

The number of points in a cell at one level is simply the sumover those contained in the four cells it is divided into at thenext level thus forming a pyramid representation The cellcounts at each level of resolution are the bin counts for thehistogram representing that level The soft correspondencebetween the two point sets can then be computed as aweighted sum over the histogram intersections at each level

The resolution of an MHI image is 640 times 480 An MHIis divided into small spatial cells based on different pyramidlevels We follow the practice in [34] by limiting the numberof levels to 119871 = 3 to prevent overfitting Figure 5 shows thatthe pyramid at level 119897 has 2119899 times 2119899 cells

The magnitude 119898(119909 119910) and orientation 120579(119909 119910) of thegradient on a pixel (119909 119910) are calculated as follows

(119909 119910) = radic119892119909(119909 119910)

2

+ 119910119910(119909 119910)

2

120579 (119909 119910) = arctan119892119909(119909 119910)

119892119910(119909 119910)

(5)

where 119892119909(119909 119910) and 119892

119910(119909 119910) are image gradients along the 119909

and 119910 directions Each gradient orientation is quantized into119870 bins In each cell of every level gradients over all the pixelsare concatenated to form a local119870 bins histogram As a resulta ROI at level 119897 is represented as a 11987021198972119897 dimension vector

All the cells at different pyramid levels are combined to forma final PHOG vector with dimension of 119889 = 119870sum

119871

119897=04119897 to

represent the whole ROIThe dimension of the PHOG feature (eg 119889 = 680 when

119870 = 8 119871 = 3) is relatively high Many dimension reductionmethods can be applied to alleviate the problem We employthe widely used principal component analysis (PCA) [41] dueto its simplicity and effectiveness

5 Random Forest (RF) and OtherClassification Algorithms

Random forest (RF) [42] is an ensemble classifier usingmanydecision tree models which can be used for classification orregression A special advantage of RF is that the accuracy andvariable importance information is provided with the resultsRandom forests create a number of classification trees Whenan vector representing a new object is input for classificationit was sent to every tree in the forest A different subset of thetraining data are selected (asymp23) with replacement to traineach tree and remaining training data are used to estimateerror and variable importance Class assignment is made bythe number of votes from all of the trees

RF has only two hyperparameters the number of vari-ables119872 in the random subset at each node and the number

6 International Journal of Vehicular Technology

Frame 640 644 648 652 656

I(x y t)

diff(x y t)

D(x y t)

MEI(x y t)

MHI(x y t)

Figure 3 Example of the driverrsquos right hand moving to shift lever from steering wheel The first row is some key frames in a driving actionThe second row is the corresponding frame difference images The third row is binary images resulted from thresholding The forth row iscumulative motion energy images The fifth row is cumulative motion history images

of trees 119879 in the forest [42] Breimarsquos RF error rate dependson two parameters the correlation between any pair oftrees and the strength of each individual tree in the forestIncreasing the correlation increases the forest error rate whileincreasing the strength of the individual trees decreasesthis misclassification rate Reducing 119872 reduces both thecorrelation and the strength119872 is often set to the square rootof the number of inputs

When the training set for a particular tree is drawn bysampling with replacement about one-third of the cases areleft out of the sample set

The RF algorithm can be summarised as follows

(1) Choose parameter 119879 which is the number of trees togrow

(2) Choose parameter119898 which is used to split each nodeand119898 = 119872 where119872 is the number of input variablesand119898 is held constant while growing the forest

(3) Grow 119879 trees When growing each tree do the follow-ing

(i) Construct a bootstrap sample of size 119899 sampledfrom 119878

119899= (119883119894 119910119894) (119894 = 1)

119899 with replacementand grow a tree from this bootstrap sample

(ii) When growing a tree at each node select 119898variables at random and use them to find thebest split

(iii) Grow the tree to a maximal extent There is nopruning

(4) To classify point119883 collect votes from every tree in theforest and then use majority voting to decide on theclass label

In this paper we also compared the accuracy of RF andseveral popular classification methods including 119896-nearestneighbor (119896NN) classifier multilayer perceptron (MLP) andSupport Vector Machines (SVM) on the driving actiondatasets

51 Other Classification Methods

511 119896minusNearest Neighbor Classifier 119896-nearest neighbour(119896NN) classifier one of themost classic and simplest classifierin machine learning classifies object based on the minimaldistance to training examples in feature space by a majorityvote of its neighbours [41] As a type of lazy learning119896NN classifier does not do any distance computation orcomparison until the test data is given Specifically the objectis assigned to the most common class among its 119896 nearestneighbours For example the object is classified as the classof its nearest neighbour if 119896 equals 1 Theoretically the errorrate of 119896NN algorithm is infinitely close to Bayes errorwhile the training set size is infinity However a satisfactoryperformance of 119896NN algorithm prefers a large number oftraining data set which results in expensive computation inpractical

512 Multilayer Perceptron Classifier In neural networkmultilayer perceptron (MLP) is an extension of the single

International Journal of Vehicular Technology 7

650Frame 640

middot middot middot

(a)

1170Frame 1150

middot middot middot

(b)

709Frame 701

middot middot middot

(c)

830Frame 794

middot middot middot

(d)

3340Frame 3318

middot middot middot

(e)

2811Frame 2792

middot middot middot

(f)

4134Frame 4109

middot middot middot

(g)

4638Frame 4617

middot middot middot

(h)Figure 4 MHIs for different driving actions (a) Right hand moving to shift lever (b) Right hand moving back to steering wheel from shiftlever (c) Right hand operating the shift lever (d) Operating the shift lever (e) Right handmoving to head from steering wheel (f) Right handmoving back to steering wheel (g) Right moving back to steering wheel from dashboard (h) Right hand moving to dashboard from steeringwheel

Level = 0

(a)

Level = 1

(b)

Level = 2

(c)Figure 5 A schematic illustration of PHOG At each resolution level PHOG consists of a histogram of orientation gradients over each imagesubregion

layer linear perceptron by adding hidden layers in between[41] AnMLP is a feedforward artificial neural networkmodelthat maps sets of input data onto a set of appropriate outputsAnMLP consists of multiple layers of nodes that is the inputlayer single or multiple hidden layer and an output layer AnMLP classifier is usually trained by the error backpropagationalgorithm

513 Support Vector Machine Support vector machine(SVM) is one of the most commonly applied supervisedlearning algorithms A SVM is formally defined by a sepa-rating hyperplane which is in a high or infinite dimensional

space Given labeled training data SVM will generate anoptimal hyperplane to categorize new examples Intuitivelythe operation of the SVM algorithm is based on finding thehyperplane that gives the largest minimum distance to thetraining examples And the optimal separating hyperplanemaximizes the margin of the training data

6 Experiments

61 Holdout Experiment We choose the two standard exper-imental procedures namely holdout approach and the cross-validation approach to verify the driving action recognition

8 International Journal of Vehicular Technology

kNN RF SVM MLP

Accuracy

88019656 9443

9093

0

01

02

03

04

05

06

07

08

09

1

Classifiers

Figure 6 Bar plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

performance using RF classifier and the PHOG featureextracted from MHI Other three classifiers 119896NN MLP andSVM will be compared

In the holdout experiment 20 of the PHOG featuresare randomly selected as testing dataset while the remaining80 of the features are used as training dataset The holdoutexperiment is usually repeated 100 times and the classificationresults are recorded In each holdout experiment cycle thesame training and testing dataset are applied to the fourdifferent classifiers simultaneously to compare their perfor-mance

Generally classification accuracy is one of the mostcommon indicators used to evaluate the performance of theclassification Figures 6 and 7 are the bar plots and box plots ofthe classification accuracies from the four classifiers with thesame decomposed driving actionsThe results are the averagefrom 100 runs The average classification accuracies of 119896NNclassifier RF classifier SVM classifier and MLP classifierare 8801 9656 9443 and 9093 respectively It isobvious that the RF classifier performs the best among thefour classifiers compared

To further evaluate the performance of RF classifierconfusionmatrix is used to visualize the discrepancy betweenthe actual class labels and predicted results from the clas-sification Confusion matrix gives the full picture at theerrors made by a classification model The confusion matrixshows how the predictions are made by the model The rowscorrespond to the known class of the data that is the labelsin the dataThe columns correspond to the predictions madeby themodelThe value of each of element in thematrix is thenumber of predictions made with the class corresponding tothe column for example with the correct value as representedby the row Thus the diagonal elements show the numberof correct classifications made for each class and the off-diagonal elements show the errors made Figure 8 showsthe confusion matrix from the above experiment for the RFclassifier

1

098

096

094

092

09

088

086

084

082

Accuracy +

++

RF SVM MLPClassifiers

kNN

Figure 7 Box plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09570 00200 00100 00130

00287 09713 0 0

000038 0 09462 00500

0 0 00123 09877

Figure 8 Confusion matrix of RF classification result from theholdout experiment

In the figure classes labelled as one two three and fourcorrespond to hand interaction with shift lever operatingthe shift lever interaction with head and interaction withdashboard respectively In the confusionmatrix the columnsare the predicted classes while the rows are the true onesFor the RF classifier the average classification rate of thefour driving actions is 9656 The respective classificationaccuracies for the four driving actions are 957 97139462 and 9877 in holdout experiment respectively Itshows that the classes one and two tend to be easily confusedwith each other with error rate of about 2 and 287respectively On the other hand the error rates from theconfusion between classes three and four lie between 12and 5

62 119896-Ford Cross-Validation The second part of our experi-ment is to use 119896-ford cross-validation to further confirm theclassification performance of the driving actions In 119896-foldcross-validation the original sets of data will be portionedinto 119896 subsets randomly One subset is retained as thevalidation data for testing while the remaining 119896 minus 1 subsetsare used as training data The cross-validation process willthen be repeated 119896 times which means that each of the 119896

International Journal of Vehicular Technology 9

RF SVM MLP

Accuracy

9464983 9739 9591

0

01

02

03

04

05

06

07

08

09

1

ClassifierskNN

Figure 9 Bar plots of classification rates from 10-fold cross-validation

1

095

09

085

08

075

RF SVM MLPClassifiers

Accuracy

+

+++

++

kNN

Figure 10 Box plots of classification rates from 10-fold cross-validation

subsamples will be used exactly once as the validation dataThe estimation can be the average of the 119896 results The keyproperty of this method is that all observations are used forboth training and validation and each observation is used forvalidation exactly once We chose 10-fold cross-validation inthe experiment whichmeans that nine of the ten splitted setsare used for training and the remaining one is reserved fortesting

The evaluation procedure is similar to the holdout exper-iment The cross-validation experiment was also conducted100 times for each of the classificationmethods Each time thePHOG feature extracted from the driving action dataset wasrandomly divided into 10 folders The average classificationaccuracies of the 100 repetitions are shown in the bar plots ofFigure 9 and box plots of Figure 10The average classificationaccuracies of 119896-NN classifier RF classifier SVM classifierand MLP classifier are 9464 9830 9739 and 9591respectively From the bar plots box plots and confusionmatrix in Figures 9 10 and 11 the RF classifier clearlyoutperforms other three classifiers compared

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09657 00237 00004 00102

00292 09708 0 0

00041 0 09483 00476

0 0 00147 09853

Figure 11 Confusion matrix of RF classification from 10-fold cross-validation experiment

7 Conclusion

In this paper we proposed an efficient approach to recognisedriving action primitives by joint application of motion his-tory image and pyramid histogram of oriented gradientsTheproposed driving action primitives lead to the hierarchicalrepresentation of driver activities The manually labelledaction primitives are jointly represented by motion historyimage and pyramid histogram of oriented gradient (PHOG)The random forest classifier was exploited to evaluate theclassification performance which gives an accuracy of over94 from the holdout and cross-validation experimentsThis compares favorably over some other commonly usedclassifications methods

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The project is supported by Suzhou Municipal Science andTechnology Foundation Grants SS201109 and SYG201140

References

[1] ldquoWHO World report on road traffic injury preventionrdquohttpwwwwhointviolence injury preventionpublicationsroad trafficworld reporten

[2] T B Moeslund A Hilton and V Kruger ldquoA survey of advancesin vision-based humanmotion capture and analysisrdquo ComputerVision and Image Understanding vol 104 no 2-3 pp 90ndash1262006

[3] X Fan B-C Yin and Y-F Sun ldquoYawning detection for mon-itoring driver fatiguerdquo in Proceedings of the 6th InternationalConference on Machine Learning and Cybernetics (ICMLC rsquo07)pp 664ndash668 Hong Kong August 2007

[4] R Grace V E Byrne D M Bierman et al ldquoA drowsy driverdetection system for heavy vehiclesrdquo in Proceedings of the 17thDigital Avionics Systems Conference vol 2 pp I361ndashI368Bellevue Wash USA 1998

10 International Journal of Vehicular Technology

[5] E Wahlstrom O Masoud and N Papanikolopoulos ldquoVision-based methods for driver monitoringrdquo in Proceedings of theIEEE Intelligent Transportation Systems vol 2 pp 903ndash908Shanghai China October 2003

[6] A Doshi and M M Trivedi ldquoOn the roles of eye gaze and headdynamics in predicting driverrsquos intent to change lanesrdquo IEEETransactions on Intelligent Transportation Systems vol 10 no3 pp 453ndash462 2009

[7] J J Jo S J Lee H G Jung K R Park and J J Kim ldquoVision-basedmethod for detecting driver drowsiness and distraction indriver monitoringsystemrdquo Optical Engineering vol 50 no 12pp 127202ndash127224 2011

[8] X Liu Y D Zhu and K Fujimura ldquoReal-time pose classicationfor driver monitoringrdquo in Proceedings of the IEEE IntelligentTransportation Systems pp 174ndash178 Singapore September2002

[9] P Watta S Lakshmanan and Y Hou ldquoNonparametricapproaches for estimating driver poserdquo IEEE Transactionson Vehicular Technology vol 56 no 4 pp 2028ndash2041 2007

[10] T Kato T Fujii andM Tanimoto ldquoDetection of driverrsquos posturein the car by using far infrared camerardquo in Proceedings of theIEEE Intelligent Vehicles Symposium pp 339ndash344 Parma ItalyJune 2004

[11] S Y Cheng S Park and M M Trivedi ldquoMulti-spectral andmulti-perspective video arrays for driver body tracking andactivity analysisrdquo Computer Vision and Image Understandingvol 106 no 2-3 pp 245ndash257 2007

[12] H Veeraraghavan N Bird S Atev and N Papanikolopou-los ldquoClassifiers for driver activity monitoringrdquo TransportationResearch C vol 15 no 1 pp 51ndash67 2007

[13] H Veeraraghavan S Atev N Bird P Schrater and NPapanikolopoulos ldquoDriver activity monitoring through super-vised and unsupervised learningrdquo in Proceedings of the 8thInternational IEEE Conference on Intelligent TransportationSystems pp 895ndash900 Vienna Austria September 2005

[14] C H Zhao B L Zhang J He and J Lian ldquoRecognition ofdriving postures by contourlet transform and random forestsrdquoIET Intelligent Transport Systems vol 6 no 2 pp 161ndash168 2012

[15] C H Zhao B L Zhang X Z Zhang S Q Zhao and H XLi ldquoRecognition of driving postures by combined features andrandom subspace ensemble of multilayer perception classifferrdquoJournal of Neural Computing and Applications vol 22 no 1Supplement pp 175ndash184 2000

[16] C Tran A Doshi andMM Trivedi ldquoModeling and predictionof driver behavior by foot gesture analysisrdquo Computer Visionand Image Understanding vol 116 no 3 pp 435ndash445 2012

[17] J C McCall and M M Trivedi ldquoVisual context capture andanalysis for driver attention monitoringrdquo in Proceedings of the7th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo04) pp 332ndash337 Washington DC USA Octo-ber 2004

[18] K Torkkola N Massey and C Wood ldquoDriver inattentiondetection through intelligent analysis of readily available sen-sorsrdquo in Proceedings of the 7th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo04) pp 326ndash331Washington DC USA October 2004

[19] D A Johnson and M M Trivedi ldquoDriving style recognitionusing a smartphone as a sensor platformrdquo in Proceedings ofthe 14th IEEE International Intelligent Transportation SystemsConference (ITSC rsquo11) pp 1609ndash1615 Washington DC USAOctober 2011

[20] A V Desai and M A Haque ldquoVigilance monitoring foroperator safety a simulation study on highway drivingrdquo Journalof Safety Research vol 37 no 2 pp 139ndash147 2006

[21] G Rigas Y Goletsis P Bougia and D I Fotiadis ldquoTowardsdriverrsquos state recognition on real driving conditionsrdquo Interna-tional Journal of Vehicular Technology vol 2011 Article ID617210 14 pages 2011

[22] M H Kutila M Jokela T Makinen J Viitanen G MarkkulaandTWVictor ldquoDriver cognitive distraction detection featureestimation and implementationrdquoProceedings of the Institution ofMechanical Engineers D vol 221 no 9 pp 1027ndash1040 2007

[23] A F Bobick and J W Davis ldquoThe recognition of humanmovement using temporal templatesrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 23 no 3 pp 257ndash267 2001

[24] Y Wang K Huang and T Tan ldquoHuman activity recognitionbased on R transformrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash8Minneapolis Minn USA June 2007

[25] H Chen H Chen Y Chen and S Lee ldquoHuman actionrecognition using star skeletonrdquo in Proceedings of the Inter-national Workshop on Video Surveillance and Sensor Networks(VSSN rsquo06) pp 171ndash178 Santa Barbara Calif USA October2006

[26] W Liang and D Suter ldquoInformative shape representationsfor human action recognitionrdquo in Proceedings of the 18thInternational Conference on Pattern Recognition (ICPR rsquo06) pp1266ndash1269 Hong Kong August 2006

[27] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the International Con-ference on Computer Vision vol 2 pp 726ndash733 Nice FranceOctober 2003

[28] A R Ahad T Ogata J K Tan H S Kim and S IshikawaldquoMotion recognition approach to solve overwriting in complexactionsrdquo in Proceedings of the 8th IEEE International Conferenceon Automatic Face and Gesture Recognition (FG rsquo08) pp 1ndash6Amsterdam The Netherlands September 2008

[29] S Ali and M Shah ldquoHuman action recognition in videosusing kinematic features and multiple instance learningrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol32 no 2 pp 288ndash303 2010

[30] S Danafar and N Gheissari ldquoAction recognition for surveil-lanceapplications using optic ow and SVMrdquo in Proceedings ofthe Asian Conferenceon Computer Vision (ACCV rsquo07) pp 457ndash466 Tokyo Japan November 2007

[31] I Laptev and T Lindeberg ldquoSpace-time interest pointsrdquo in Pro-ceedings of the 9th IEEE International Conference on ComputerVision pp 432ndash439 Nice France October 2003

[32] I Laptev B Caputo C Schuldt and T Lindeberg ldquoLocalvelocity-adapted motion events for spatio-temporal recogni-tionrdquo Computer Vision and Image Understanding vol 108 no3 pp 207ndash229 2007

[33] S Park and M Trivedi ldquoDriver activity analysis for intelligentvehicles issues and development frameworkrdquo in Proceedings ofthe IEEE Intelligent Vehicles Symposium pp 644ndash649 LasVegasNev USA June 2005

[34] A Bosch A Zisserman and X Munoz ldquoRepresenting shapewith a spatial pyramid kernelrdquo in Proceedings of the 6thACM International Conference on Image and Video Retrieval(CIVR rsquo07) pp 401ndash408 Amsterdam The Netherlands July2007

International Journal of Vehicular Technology 11

[35] N Dalal and B Triggs ldquoHistograms of oriented gradientsfor human detectionrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR rsquo05) pp 886ndash893 San Diego Calif USA June 2005

[36] G R Bradski and J Davis ldquoMotion segmentation and poserecognitionwithmotion history gradientsrdquo inProceedings of the5th IEEEWorkshop onApplications of Computer Vision pp 238ndash244 Palm Springs Calif USA 2000

[37] O Masoud and N Papanikolopoulos ldquoA method for humanaction recognitionrdquo Image and Vision Computing vol 21 no 8pp 729ndash743 2003

[38] H Yi D Rajan and L-T Chia ldquoA new motion histogram toindex motion content in video segmentsrdquo Pattern RecognitionLetters vol 26 no 9 pp 1221ndash1231 2005

[39] J Flusser T Suk and B ZitovMoments andMoment Invariantsin Pattern Recognition John Wiley amp Sons Chichester UK2009

[40] S Lazebnik C Schmid and J Ponce ldquoBeyond bags of featuresspatial pyramid matching for recognizing natural scene cate-goriesrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo06) pp2169ndash2178 New York NY USA June 2006

[41] R O Duda P E Hart and D G Stork Pattern ClassiffcationWiley-Interscience 2nd edition 2000

[42] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

2 International Journal of Vehicular Technology

of the action recognition is to classify image sequences toa human action based on the temporality of video imagesMuch progress has been made on how to distinguish actionsin daily life using cameras and machine learning algorithmsHAR has no unique definition it changes depending on thedifferent levels of abstraction Moeslund et al [2] proposeddifferent taxonomies that is action primitive action andactivity An action primitive is a very basic movement thatcan be described at the decomposed level An action is com-posed of action primitives that describes a cyclic or whole-body movement Activities consist of a sequence of actionsparticipated by one ormore participates In the recognition ofdriversrsquo action the context is usually not taken into accountfor example the background environment variation outsidethe window and interactions with another person or movingobject Accordingly this paper only focuses on partial bodymovements of the driver

There exists some works on driver activity monitoringTo monitor a driverrsquos behavior some of the works focusedon the detection of driver alertness through monitoring theeyes face head or facial expressions [3ndash7] In one study thedriverrsquos facewas tracked and yaworientation angleswere usedto estimate the driverrsquos face pose [8]The Fisherface approachwas applied by Watta et al to represent and recognise thedriverrsquos seven poses including looking over the left shoulderin the left rear-view mirror at the road ahead down atthe instrument panel at the centre rear-view mirror at theright rear-view mirror or over the right shoulder [9] Inorder to minimize the influence of various illumination andbackground Kato et al used a far infrared camera to detectthe driver face direction such as leftward frontward andrightward [10] Cheng et al presented a combination ofthermal infrared and color images with multiple camerasto track important driver body parts and to analyze driveractivities such as steering the car forward turning left andturning right [11] Veeraraghavan et al used the driverrsquosskin-region information to group two actions grasping thesteering wheel and talking on a cell phone [12 13] Zhao et alextended and improved Veeraraghavanrsquos work to recognisefour driving postures that is grasping the steering wheeloperating the shift lever eating and talking on a cell phone[14 15] Tran et al studied driverrsquos behaviors by foot gestureanalysis [16] Other works focused on capturing the driverrsquosattention by combining different vision-based features andphysical status of the vehicle [17ndash22]

The task of driver activity monitoring can be generallystudied in the human action recognition framework theemphasis of which is often on finding good feature repre-sentations that should be able to tolerate variations in view-point human subject background illumination and so onThere are two main categories of feature descriptions globaldescriptions and local descriptions The former considerthe visual observation as a whole while the latter describethe observation as a collection of independent patches orlocal descriptors Generally global representation is derivedfrom silhouettes edges or optical flow One of the earli-est global representation approaches called motion historyimage (MHI) was proposed by Bobick and Davis [23]which extract silhouettes by using background subtraction

and aggregate difference between subsequence in an actionsequence Other global description methods include the Rtransform [24] contour-based approach [25 26] and opticalflow [27ndash30] The weakness of global representation includesthe sensitivity to noise partial occlusions and variations inviewpoint Instead of global representation local represen-tation describes the observation as a collection of space-time interesting points [31] or local descriptors which usu-ally does not require accurate localisation and backgroundsubtraction Local representation has the advantage of beinginvariant to different of viewpoint appearance of person andpartial occlusions The representative local descriptor is thespace-time interest point detectors proposed by Laptev andLindeberg [31] which however has the shortcoming of onlyhaving a small number of stable interest points available inpractice Some of their derivations have been proposed forexample extracted space-time cuboids [32]

In this paper we studied driversrsquo activity recognitionby comprehensively considering action detection represen-tation and classification Our contributions include threeparts The first part is our deviation from many publishedworks on driversrsquo posture based on static images fromdriversrsquo action sequence which has the potential problem ofconfusion caused by similar postures It is very possible thattwo frames of vision-similar posture are extracted from twocompletely different action image sequences For examplethe momentframe that a driver moves the cell phone acrosshis or her mouth can be confused as eating Following theaction definition in [2] which is based on the combinationof basic movements we regard driving activity as space-time action instead of static space-limited posture The maindriving activity we considered are hand-conducted actionssuch as eating and using a cell phone

The second contribution of this paper is our proposalof the driving action decomposition Generally the drivingactions that take place in the drivers seat are mainly per-formed by hand which include but are not limited to eatingsmoking talking on the cell phone and operating the shiftlever These actions or activities are usually performed byshifting the hand position which is confined to the driversseat Following the train of thought in [2] we regard theactions or activities as a combination of a number of basicmovements or action primitives We created a driving actiondataset similar to the SEU dataset [14] with four differenttypes of action sequences including operating the shift leverresponding to a cell phone call eating and smoking Theactions are then decomposed into four action primitives thatis hand interaction with shift lever hand operating the shiftlever hand interaction with head and hand interaction withdashboard Upon the classification of these action primitivesthe driving actions involving eating smoking and otherabnormal behaviors can be accordingly recognised as acombination of action primitives [33]

The last contribution of this paper is the proposal ofa global grid-based representation for the driving actionswhich is a combination of the motion history image (MHI)[23] and pyramid histogram of oriented gradients (POHG)[34] and the application of random forest classifier (RF)for the driving actions recognition Encoding the region

International Journal of Vehicular Technology 3

of interest in the drivers seat is a natural choice as thereare few noises and no partial occlusions in the video Theaction silhouettes were first extracted to represent actionprimitives by applying MHI to aggregate the differencebetween subsequent frames To have better discriminationthan MHI alone the pyramid histogram of oriented gradientof the MHI was calculated as the features for further trainingand classification PHOG is a spatial pyramid extension ofthe histogram of gradients (HOG) [35] descriptors whichhas been used extensively in computer vision After thecomparison of several different classification algorithmsthe random forest (RF) classifier was chosen for the drivingaction recognition which offers satisfactory performance

The rest of the paper is organized as follows Section 2gives a brief introduction on our driving posture datasetcreation and the necessary preprocessing Section 3 andSection 4 review the motion history image and the pyramidhistogram of oriented gradients with explanation of howthey are applied in driving posture description respectivelySection 5 introduces the random forest classifier and otherthree commonly used classificationmethods for comparisonSection 6 reports the experiment results followed by conclu-sion in Section 7

2 Driving Action DatasetCreation and Preprocessing

A driving action dataset was prepared which contains 20video clips in 640times42424 fpsThe video was recorded usinga Nikon D90 camera at a car park in the Xirsquoan Jiaotong-Liverpool University Tenmale drivers and ten female driversparticipated in the experiment by pretending to drive in thecar and conducting several actions that simulated real drivingsituations Five predefined driving actions were imitated thatis turning the steering wheel operating the shift lever eatingsmoking and using a cell phone

There are five steps involved in simulating the drivingactivities by each participant

Step 1 A driver first grasps the steering wheel and slightlyturns the steering wheel

Step 2 The driverrsquos right hand moves to shift the lever andoperates it for several times before moving back to thesteering wheel

Step 3 The driver takes a cell phone from the dashboard andresponds to a phone call and then puts it back by his or herright hand

Step 4 Thedriver takes a cookie from the dashboard and eatsit using his or her right hand

Step 5 For male drivers he takes a cigarette from thedashboard and puts it into his mouth and then uses a lighterto light the cigarette and then puts it back on the dashboard

This experiment extracted twenty consecutive picturesequences from the video of the dataset for further experi-mentation

Frame number0 1000 2000 3000 4000 5000 6000

12000

10000

8000

2000

4000

6000

0

Are

a of d

iffer

ence

poi

nt co

mpa

red

to p

revi

ous f

ram

e

Figure 1 The vertical axis stands for area of difference point com-pared to previous frame by applying Otsursquos thresholding method[26] the horizontal axis stands for the frame number

21 Action Detection and Segmentation Being similar tomany intelligent video analysis systems action recognitionshould start with motion detection in a continuous videostream for which many approaches are available Among thepopular approaches frame differencing is the simplest andmost efficient method which involves taking the differencebetween two frames to detect the object Frame differencingis widely appliedwith proven performance particularly whena fixed camera is used to observe dynamic events in a scene

With each driving action sequence the frame differ-ences between two adjacent image frames are first calcu-lated followed by thresholding operation to identify movingobjects Otsursquos thresholding method [26] was chosen whichminimizes the intraclass variance of the black and whitepixels The existence of moving objects will be determinedby evaluating whether there exist connected regions in thebinary image And the location of the moving objects can befurther calculated based on the total areas and coordinate ofconnected regions The details can be illustrated by Figure 1

In the following section the segmented actions imagesare further manually labelled into four different categories ofaction primitives based on the trajectory of driverrsquos right handas shown in Figure 2 The first category of action is movingto shift lever with the right hand from the steering wheel ormoving back to the steering wheel from the shift lever Thesecond category of action is operating the shift lever withthe right hand The third category of action is moving to thedashboard from the steering wheel with the right hand ormoving back to the steering wheel from the dashboard withthe right handThe fourth category of action is moving to thehead from the steering wheel or moving back to the steeringwheel from the head with the right hand

3 Motion History Image (MHI)

Motion history image (MHI) approach is a view-basedtemporal template approach developed by Bobick and Davis[23] which is simple but robust in the representation ofmovements and is widely employed in action recognition

4 International Journal of Vehicular Technology

motion analysis and other related applications [36ndash38] Themotion history image (MHI) can describe how the motion ismoving in the image sequence Another representation calledmotion energy image (MEI) can demonstrate the presence ofany motion or a spatial pattern in the image sequence BothMHI and MEI templates comprise the motion history image(MHI) template-matching method

Figure 3 shows a movement in driving The first row issome key frames in a driving action The second and thirdrows are the frame differences and the corresponding binaryimages from applying Otsursquos thresholding The fourth andfifth rows are cumulative MEI and MHI images respectivelyMHIrsquos pixel intensity is a function of the recency of motion ina sequence where brighter values correspond to more recentmotion We currently use a simple replacement and lineardecay operator using the binary image difference framesTheformal definitions are briefly explained below

diff (119909 119910 119905) = zeros (119909 119910 1) if 119905 = 11003816100381610038161003816119868 (119909 119910 119905 minus 1) minus 119868 (119909 119910 119905)

1003816100381610038161003816 otherwise(1)

where diff (119909 119910 119905) is a difference image sequence indicatingthe difference compared to previous frame Let

119863(119909 119910 119905) = 0 if diff (119909 119910 119905) lt threshold1 otherwise

(2)

where 119863(119909 119910 119905) is binary images sequence indicating regionof motion Then the motion energy image is defined as

MEI (119909 119910 119905) =119905 = action end frame

119905 = action start frame119863(119909 119910 119905) (3)

Both motion history images and motion energy imageswere introduced to capture motion information in images[23] While MEI only indicates where the motion is motionhistory image MHI (119909 119910 119905) represents the way the objectmoving which can be defined as

MHI =

255 if 119863(119909 119910 119905) = 1

max 0MHI (119909 119910 119905 minus 1) minus 1 if 119863(119909 119910 119905) = 1255

picseqlengthle 1

max0MHI (119909 119910 119905 minus 1) minus floor ( 255

pic seq length) if 119863(119909 119910 119905) = 1

255

picseqlengthgt 1

(4)

The result is a scalar-valued image where latest movingpixels are the brightest MHI can represent the location theshape and the movement direction of an action in a picturesequence As MEI can be obtained by thresholding the MHIabove zero we will only consider features derived fromMHIin the following

After the driving actions were detected and segmentedfrom the raw video dataset motion history images wereextracted for each of the four decomposed action setsFigure 4 demonstrates how the motion history image iscalculated to represent movements for each decomposedaction sequence In the figure the left column and themiddlecolumn are the start and end frames of a decomposed actionsnippet respectively The right column is the MHI calculatedfor the corresponding action snippet

4 Pyramid Histogram of OrientedGradients (PHOG)

Motion history image MHI is not appropriate to be directlyexploited as features for the purpose of comparison orclassification in practical applications In the basic MHImethod [23] after calculating the MHI and MEI featurevectors are calculated employing the seven high-order HumomentsThen these feature vectors are used for recognitionHowever Hursquos moment invariants have some drawbacks

particularly limited recognition power [39] In this paper thehistogram of oriented gradients feature is extracted from theMHI as the suitable features for classification

In many image processing tasks the local geometricalshapes within an image can be characterized by the dis-tribution of edge directions called histograms of orientedgradients (HOG) [35] HOG can be calculated by evaluating adense grid of well-normalized local histograms of image gra-dient orientations over the image windows HOG has someimportant advantages over other local shape descriptors forexample it is invariant to small deformations and robust interms of outliers and noise

The HOG feature encodes the gradient orientation ofone image patch without considering where this orientationoriginates from in this patch Therefore it is not discrimi-native enough when the spatial property of the underlyingstructure of the image patch is important The objective ofa newly proposed improved descriptor pyramid histogramof oriented gradients (PHOG) [34] is to take the spatialproperty of the local shape into account while representingan image by HOG The spatial information is represented bytiling the image into regions at multiple resolutions based onspatial pyramid matching [40] Each image is divided intoa sequence of increasingly finer spatial grids by repeatedlydoubling the number of divisions in each axis directionThe number of points in each grid cell is then recorded

International Journal of Vehicular Technology 5

Hand interaction with shift lever

Frame 640 650 660

Frame 1150 1159 1170

(a)

Hand operating the shift lever

Frame 794 830815

Frame 701 705 709

(b)

Hand interaction with head

Frame 3318 3327 3340

Frame 2792 2801 2811

(c)

Hand interaction with dashboardFrame 4617 4631 4638

Frame 4109 4118 4134

(d)

Figure 2 Four manually decomposed action primitives

The number of points in a cell at one level is simply the sumover those contained in the four cells it is divided into at thenext level thus forming a pyramid representation The cellcounts at each level of resolution are the bin counts for thehistogram representing that level The soft correspondencebetween the two point sets can then be computed as aweighted sum over the histogram intersections at each level

The resolution of an MHI image is 640 times 480 An MHIis divided into small spatial cells based on different pyramidlevels We follow the practice in [34] by limiting the numberof levels to 119871 = 3 to prevent overfitting Figure 5 shows thatthe pyramid at level 119897 has 2119899 times 2119899 cells

The magnitude 119898(119909 119910) and orientation 120579(119909 119910) of thegradient on a pixel (119909 119910) are calculated as follows

(119909 119910) = radic119892119909(119909 119910)

2

+ 119910119910(119909 119910)

2

120579 (119909 119910) = arctan119892119909(119909 119910)

119892119910(119909 119910)

(5)

where 119892119909(119909 119910) and 119892

119910(119909 119910) are image gradients along the 119909

and 119910 directions Each gradient orientation is quantized into119870 bins In each cell of every level gradients over all the pixelsare concatenated to form a local119870 bins histogram As a resulta ROI at level 119897 is represented as a 11987021198972119897 dimension vector

All the cells at different pyramid levels are combined to forma final PHOG vector with dimension of 119889 = 119870sum

119871

119897=04119897 to

represent the whole ROIThe dimension of the PHOG feature (eg 119889 = 680 when

119870 = 8 119871 = 3) is relatively high Many dimension reductionmethods can be applied to alleviate the problem We employthe widely used principal component analysis (PCA) [41] dueto its simplicity and effectiveness

5 Random Forest (RF) and OtherClassification Algorithms

Random forest (RF) [42] is an ensemble classifier usingmanydecision tree models which can be used for classification orregression A special advantage of RF is that the accuracy andvariable importance information is provided with the resultsRandom forests create a number of classification trees Whenan vector representing a new object is input for classificationit was sent to every tree in the forest A different subset of thetraining data are selected (asymp23) with replacement to traineach tree and remaining training data are used to estimateerror and variable importance Class assignment is made bythe number of votes from all of the trees

RF has only two hyperparameters the number of vari-ables119872 in the random subset at each node and the number

6 International Journal of Vehicular Technology

Frame 640 644 648 652 656

I(x y t)

diff(x y t)

D(x y t)

MEI(x y t)

MHI(x y t)

Figure 3 Example of the driverrsquos right hand moving to shift lever from steering wheel The first row is some key frames in a driving actionThe second row is the corresponding frame difference images The third row is binary images resulted from thresholding The forth row iscumulative motion energy images The fifth row is cumulative motion history images

of trees 119879 in the forest [42] Breimarsquos RF error rate dependson two parameters the correlation between any pair oftrees and the strength of each individual tree in the forestIncreasing the correlation increases the forest error rate whileincreasing the strength of the individual trees decreasesthis misclassification rate Reducing 119872 reduces both thecorrelation and the strength119872 is often set to the square rootof the number of inputs

When the training set for a particular tree is drawn bysampling with replacement about one-third of the cases areleft out of the sample set

The RF algorithm can be summarised as follows

(1) Choose parameter 119879 which is the number of trees togrow

(2) Choose parameter119898 which is used to split each nodeand119898 = 119872 where119872 is the number of input variablesand119898 is held constant while growing the forest

(3) Grow 119879 trees When growing each tree do the follow-ing

(i) Construct a bootstrap sample of size 119899 sampledfrom 119878

119899= (119883119894 119910119894) (119894 = 1)

119899 with replacementand grow a tree from this bootstrap sample

(ii) When growing a tree at each node select 119898variables at random and use them to find thebest split

(iii) Grow the tree to a maximal extent There is nopruning

(4) To classify point119883 collect votes from every tree in theforest and then use majority voting to decide on theclass label

In this paper we also compared the accuracy of RF andseveral popular classification methods including 119896-nearestneighbor (119896NN) classifier multilayer perceptron (MLP) andSupport Vector Machines (SVM) on the driving actiondatasets

51 Other Classification Methods

511 119896minusNearest Neighbor Classifier 119896-nearest neighbour(119896NN) classifier one of themost classic and simplest classifierin machine learning classifies object based on the minimaldistance to training examples in feature space by a majorityvote of its neighbours [41] As a type of lazy learning119896NN classifier does not do any distance computation orcomparison until the test data is given Specifically the objectis assigned to the most common class among its 119896 nearestneighbours For example the object is classified as the classof its nearest neighbour if 119896 equals 1 Theoretically the errorrate of 119896NN algorithm is infinitely close to Bayes errorwhile the training set size is infinity However a satisfactoryperformance of 119896NN algorithm prefers a large number oftraining data set which results in expensive computation inpractical

512 Multilayer Perceptron Classifier In neural networkmultilayer perceptron (MLP) is an extension of the single

International Journal of Vehicular Technology 7

650Frame 640

middot middot middot

(a)

1170Frame 1150

middot middot middot

(b)

709Frame 701

middot middot middot

(c)

830Frame 794

middot middot middot

(d)

3340Frame 3318

middot middot middot

(e)

2811Frame 2792

middot middot middot

(f)

4134Frame 4109

middot middot middot

(g)

4638Frame 4617

middot middot middot

(h)Figure 4 MHIs for different driving actions (a) Right hand moving to shift lever (b) Right hand moving back to steering wheel from shiftlever (c) Right hand operating the shift lever (d) Operating the shift lever (e) Right handmoving to head from steering wheel (f) Right handmoving back to steering wheel (g) Right moving back to steering wheel from dashboard (h) Right hand moving to dashboard from steeringwheel

Level = 0

(a)

Level = 1

(b)

Level = 2

(c)Figure 5 A schematic illustration of PHOG At each resolution level PHOG consists of a histogram of orientation gradients over each imagesubregion

layer linear perceptron by adding hidden layers in between[41] AnMLP is a feedforward artificial neural networkmodelthat maps sets of input data onto a set of appropriate outputsAnMLP consists of multiple layers of nodes that is the inputlayer single or multiple hidden layer and an output layer AnMLP classifier is usually trained by the error backpropagationalgorithm

513 Support Vector Machine Support vector machine(SVM) is one of the most commonly applied supervisedlearning algorithms A SVM is formally defined by a sepa-rating hyperplane which is in a high or infinite dimensional

space Given labeled training data SVM will generate anoptimal hyperplane to categorize new examples Intuitivelythe operation of the SVM algorithm is based on finding thehyperplane that gives the largest minimum distance to thetraining examples And the optimal separating hyperplanemaximizes the margin of the training data

6 Experiments

61 Holdout Experiment We choose the two standard exper-imental procedures namely holdout approach and the cross-validation approach to verify the driving action recognition

8 International Journal of Vehicular Technology

kNN RF SVM MLP

Accuracy

88019656 9443

9093

0

01

02

03

04

05

06

07

08

09

1

Classifiers

Figure 6 Bar plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

performance using RF classifier and the PHOG featureextracted from MHI Other three classifiers 119896NN MLP andSVM will be compared

In the holdout experiment 20 of the PHOG featuresare randomly selected as testing dataset while the remaining80 of the features are used as training dataset The holdoutexperiment is usually repeated 100 times and the classificationresults are recorded In each holdout experiment cycle thesame training and testing dataset are applied to the fourdifferent classifiers simultaneously to compare their perfor-mance

Generally classification accuracy is one of the mostcommon indicators used to evaluate the performance of theclassification Figures 6 and 7 are the bar plots and box plots ofthe classification accuracies from the four classifiers with thesame decomposed driving actionsThe results are the averagefrom 100 runs The average classification accuracies of 119896NNclassifier RF classifier SVM classifier and MLP classifierare 8801 9656 9443 and 9093 respectively It isobvious that the RF classifier performs the best among thefour classifiers compared

To further evaluate the performance of RF classifierconfusionmatrix is used to visualize the discrepancy betweenthe actual class labels and predicted results from the clas-sification Confusion matrix gives the full picture at theerrors made by a classification model The confusion matrixshows how the predictions are made by the model The rowscorrespond to the known class of the data that is the labelsin the dataThe columns correspond to the predictions madeby themodelThe value of each of element in thematrix is thenumber of predictions made with the class corresponding tothe column for example with the correct value as representedby the row Thus the diagonal elements show the numberof correct classifications made for each class and the off-diagonal elements show the errors made Figure 8 showsthe confusion matrix from the above experiment for the RFclassifier

1

098

096

094

092

09

088

086

084

082

Accuracy +

++

RF SVM MLPClassifiers

kNN

Figure 7 Box plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09570 00200 00100 00130

00287 09713 0 0

000038 0 09462 00500

0 0 00123 09877

Figure 8 Confusion matrix of RF classification result from theholdout experiment

In the figure classes labelled as one two three and fourcorrespond to hand interaction with shift lever operatingthe shift lever interaction with head and interaction withdashboard respectively In the confusionmatrix the columnsare the predicted classes while the rows are the true onesFor the RF classifier the average classification rate of thefour driving actions is 9656 The respective classificationaccuracies for the four driving actions are 957 97139462 and 9877 in holdout experiment respectively Itshows that the classes one and two tend to be easily confusedwith each other with error rate of about 2 and 287respectively On the other hand the error rates from theconfusion between classes three and four lie between 12and 5

62 119896-Ford Cross-Validation The second part of our experi-ment is to use 119896-ford cross-validation to further confirm theclassification performance of the driving actions In 119896-foldcross-validation the original sets of data will be portionedinto 119896 subsets randomly One subset is retained as thevalidation data for testing while the remaining 119896 minus 1 subsetsare used as training data The cross-validation process willthen be repeated 119896 times which means that each of the 119896

International Journal of Vehicular Technology 9

RF SVM MLP

Accuracy

9464983 9739 9591

0

01

02

03

04

05

06

07

08

09

1

ClassifierskNN

Figure 9 Bar plots of classification rates from 10-fold cross-validation

1

095

09

085

08

075

RF SVM MLPClassifiers

Accuracy

+

+++

++

kNN

Figure 10 Box plots of classification rates from 10-fold cross-validation

subsamples will be used exactly once as the validation dataThe estimation can be the average of the 119896 results The keyproperty of this method is that all observations are used forboth training and validation and each observation is used forvalidation exactly once We chose 10-fold cross-validation inthe experiment whichmeans that nine of the ten splitted setsare used for training and the remaining one is reserved fortesting

The evaluation procedure is similar to the holdout exper-iment The cross-validation experiment was also conducted100 times for each of the classificationmethods Each time thePHOG feature extracted from the driving action dataset wasrandomly divided into 10 folders The average classificationaccuracies of the 100 repetitions are shown in the bar plots ofFigure 9 and box plots of Figure 10The average classificationaccuracies of 119896-NN classifier RF classifier SVM classifierand MLP classifier are 9464 9830 9739 and 9591respectively From the bar plots box plots and confusionmatrix in Figures 9 10 and 11 the RF classifier clearlyoutperforms other three classifiers compared

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09657 00237 00004 00102

00292 09708 0 0

00041 0 09483 00476

0 0 00147 09853

Figure 11 Confusion matrix of RF classification from 10-fold cross-validation experiment

7 Conclusion

In this paper we proposed an efficient approach to recognisedriving action primitives by joint application of motion his-tory image and pyramid histogram of oriented gradientsTheproposed driving action primitives lead to the hierarchicalrepresentation of driver activities The manually labelledaction primitives are jointly represented by motion historyimage and pyramid histogram of oriented gradient (PHOG)The random forest classifier was exploited to evaluate theclassification performance which gives an accuracy of over94 from the holdout and cross-validation experimentsThis compares favorably over some other commonly usedclassifications methods

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The project is supported by Suzhou Municipal Science andTechnology Foundation Grants SS201109 and SYG201140

References

[1] ldquoWHO World report on road traffic injury preventionrdquohttpwwwwhointviolence injury preventionpublicationsroad trafficworld reporten

[2] T B Moeslund A Hilton and V Kruger ldquoA survey of advancesin vision-based humanmotion capture and analysisrdquo ComputerVision and Image Understanding vol 104 no 2-3 pp 90ndash1262006

[3] X Fan B-C Yin and Y-F Sun ldquoYawning detection for mon-itoring driver fatiguerdquo in Proceedings of the 6th InternationalConference on Machine Learning and Cybernetics (ICMLC rsquo07)pp 664ndash668 Hong Kong August 2007

[4] R Grace V E Byrne D M Bierman et al ldquoA drowsy driverdetection system for heavy vehiclesrdquo in Proceedings of the 17thDigital Avionics Systems Conference vol 2 pp I361ndashI368Bellevue Wash USA 1998

10 International Journal of Vehicular Technology

[5] E Wahlstrom O Masoud and N Papanikolopoulos ldquoVision-based methods for driver monitoringrdquo in Proceedings of theIEEE Intelligent Transportation Systems vol 2 pp 903ndash908Shanghai China October 2003

[6] A Doshi and M M Trivedi ldquoOn the roles of eye gaze and headdynamics in predicting driverrsquos intent to change lanesrdquo IEEETransactions on Intelligent Transportation Systems vol 10 no3 pp 453ndash462 2009

[7] J J Jo S J Lee H G Jung K R Park and J J Kim ldquoVision-basedmethod for detecting driver drowsiness and distraction indriver monitoringsystemrdquo Optical Engineering vol 50 no 12pp 127202ndash127224 2011

[8] X Liu Y D Zhu and K Fujimura ldquoReal-time pose classicationfor driver monitoringrdquo in Proceedings of the IEEE IntelligentTransportation Systems pp 174ndash178 Singapore September2002

[9] P Watta S Lakshmanan and Y Hou ldquoNonparametricapproaches for estimating driver poserdquo IEEE Transactionson Vehicular Technology vol 56 no 4 pp 2028ndash2041 2007

[10] T Kato T Fujii andM Tanimoto ldquoDetection of driverrsquos posturein the car by using far infrared camerardquo in Proceedings of theIEEE Intelligent Vehicles Symposium pp 339ndash344 Parma ItalyJune 2004

[11] S Y Cheng S Park and M M Trivedi ldquoMulti-spectral andmulti-perspective video arrays for driver body tracking andactivity analysisrdquo Computer Vision and Image Understandingvol 106 no 2-3 pp 245ndash257 2007

[12] H Veeraraghavan N Bird S Atev and N Papanikolopou-los ldquoClassifiers for driver activity monitoringrdquo TransportationResearch C vol 15 no 1 pp 51ndash67 2007

[13] H Veeraraghavan S Atev N Bird P Schrater and NPapanikolopoulos ldquoDriver activity monitoring through super-vised and unsupervised learningrdquo in Proceedings of the 8thInternational IEEE Conference on Intelligent TransportationSystems pp 895ndash900 Vienna Austria September 2005

[14] C H Zhao B L Zhang J He and J Lian ldquoRecognition ofdriving postures by contourlet transform and random forestsrdquoIET Intelligent Transport Systems vol 6 no 2 pp 161ndash168 2012

[15] C H Zhao B L Zhang X Z Zhang S Q Zhao and H XLi ldquoRecognition of driving postures by combined features andrandom subspace ensemble of multilayer perception classifferrdquoJournal of Neural Computing and Applications vol 22 no 1Supplement pp 175ndash184 2000

[16] C Tran A Doshi andMM Trivedi ldquoModeling and predictionof driver behavior by foot gesture analysisrdquo Computer Visionand Image Understanding vol 116 no 3 pp 435ndash445 2012

[17] J C McCall and M M Trivedi ldquoVisual context capture andanalysis for driver attention monitoringrdquo in Proceedings of the7th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo04) pp 332ndash337 Washington DC USA Octo-ber 2004

[18] K Torkkola N Massey and C Wood ldquoDriver inattentiondetection through intelligent analysis of readily available sen-sorsrdquo in Proceedings of the 7th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo04) pp 326ndash331Washington DC USA October 2004

[19] D A Johnson and M M Trivedi ldquoDriving style recognitionusing a smartphone as a sensor platformrdquo in Proceedings ofthe 14th IEEE International Intelligent Transportation SystemsConference (ITSC rsquo11) pp 1609ndash1615 Washington DC USAOctober 2011

[20] A V Desai and M A Haque ldquoVigilance monitoring foroperator safety a simulation study on highway drivingrdquo Journalof Safety Research vol 37 no 2 pp 139ndash147 2006

[21] G Rigas Y Goletsis P Bougia and D I Fotiadis ldquoTowardsdriverrsquos state recognition on real driving conditionsrdquo Interna-tional Journal of Vehicular Technology vol 2011 Article ID617210 14 pages 2011

[22] M H Kutila M Jokela T Makinen J Viitanen G MarkkulaandTWVictor ldquoDriver cognitive distraction detection featureestimation and implementationrdquoProceedings of the Institution ofMechanical Engineers D vol 221 no 9 pp 1027ndash1040 2007

[23] A F Bobick and J W Davis ldquoThe recognition of humanmovement using temporal templatesrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 23 no 3 pp 257ndash267 2001

[24] Y Wang K Huang and T Tan ldquoHuman activity recognitionbased on R transformrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash8Minneapolis Minn USA June 2007

[25] H Chen H Chen Y Chen and S Lee ldquoHuman actionrecognition using star skeletonrdquo in Proceedings of the Inter-national Workshop on Video Surveillance and Sensor Networks(VSSN rsquo06) pp 171ndash178 Santa Barbara Calif USA October2006

[26] W Liang and D Suter ldquoInformative shape representationsfor human action recognitionrdquo in Proceedings of the 18thInternational Conference on Pattern Recognition (ICPR rsquo06) pp1266ndash1269 Hong Kong August 2006

[27] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the International Con-ference on Computer Vision vol 2 pp 726ndash733 Nice FranceOctober 2003

[28] A R Ahad T Ogata J K Tan H S Kim and S IshikawaldquoMotion recognition approach to solve overwriting in complexactionsrdquo in Proceedings of the 8th IEEE International Conferenceon Automatic Face and Gesture Recognition (FG rsquo08) pp 1ndash6Amsterdam The Netherlands September 2008

[29] S Ali and M Shah ldquoHuman action recognition in videosusing kinematic features and multiple instance learningrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol32 no 2 pp 288ndash303 2010

[30] S Danafar and N Gheissari ldquoAction recognition for surveil-lanceapplications using optic ow and SVMrdquo in Proceedings ofthe Asian Conferenceon Computer Vision (ACCV rsquo07) pp 457ndash466 Tokyo Japan November 2007

[31] I Laptev and T Lindeberg ldquoSpace-time interest pointsrdquo in Pro-ceedings of the 9th IEEE International Conference on ComputerVision pp 432ndash439 Nice France October 2003

[32] I Laptev B Caputo C Schuldt and T Lindeberg ldquoLocalvelocity-adapted motion events for spatio-temporal recogni-tionrdquo Computer Vision and Image Understanding vol 108 no3 pp 207ndash229 2007

[33] S Park and M Trivedi ldquoDriver activity analysis for intelligentvehicles issues and development frameworkrdquo in Proceedings ofthe IEEE Intelligent Vehicles Symposium pp 644ndash649 LasVegasNev USA June 2005

[34] A Bosch A Zisserman and X Munoz ldquoRepresenting shapewith a spatial pyramid kernelrdquo in Proceedings of the 6thACM International Conference on Image and Video Retrieval(CIVR rsquo07) pp 401ndash408 Amsterdam The Netherlands July2007

International Journal of Vehicular Technology 11

[35] N Dalal and B Triggs ldquoHistograms of oriented gradientsfor human detectionrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR rsquo05) pp 886ndash893 San Diego Calif USA June 2005

[36] G R Bradski and J Davis ldquoMotion segmentation and poserecognitionwithmotion history gradientsrdquo inProceedings of the5th IEEEWorkshop onApplications of Computer Vision pp 238ndash244 Palm Springs Calif USA 2000

[37] O Masoud and N Papanikolopoulos ldquoA method for humanaction recognitionrdquo Image and Vision Computing vol 21 no 8pp 729ndash743 2003

[38] H Yi D Rajan and L-T Chia ldquoA new motion histogram toindex motion content in video segmentsrdquo Pattern RecognitionLetters vol 26 no 9 pp 1221ndash1231 2005

[39] J Flusser T Suk and B ZitovMoments andMoment Invariantsin Pattern Recognition John Wiley amp Sons Chichester UK2009

[40] S Lazebnik C Schmid and J Ponce ldquoBeyond bags of featuresspatial pyramid matching for recognizing natural scene cate-goriesrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo06) pp2169ndash2178 New York NY USA June 2006

[41] R O Duda P E Hart and D G Stork Pattern ClassiffcationWiley-Interscience 2nd edition 2000

[42] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

International Journal of Vehicular Technology 3

of interest in the drivers seat is a natural choice as thereare few noises and no partial occlusions in the video Theaction silhouettes were first extracted to represent actionprimitives by applying MHI to aggregate the differencebetween subsequent frames To have better discriminationthan MHI alone the pyramid histogram of oriented gradientof the MHI was calculated as the features for further trainingand classification PHOG is a spatial pyramid extension ofthe histogram of gradients (HOG) [35] descriptors whichhas been used extensively in computer vision After thecomparison of several different classification algorithmsthe random forest (RF) classifier was chosen for the drivingaction recognition which offers satisfactory performance

The rest of the paper is organized as follows Section 2gives a brief introduction on our driving posture datasetcreation and the necessary preprocessing Section 3 andSection 4 review the motion history image and the pyramidhistogram of oriented gradients with explanation of howthey are applied in driving posture description respectivelySection 5 introduces the random forest classifier and otherthree commonly used classificationmethods for comparisonSection 6 reports the experiment results followed by conclu-sion in Section 7

2 Driving Action DatasetCreation and Preprocessing

A driving action dataset was prepared which contains 20video clips in 640times42424 fpsThe video was recorded usinga Nikon D90 camera at a car park in the Xirsquoan Jiaotong-Liverpool University Tenmale drivers and ten female driversparticipated in the experiment by pretending to drive in thecar and conducting several actions that simulated real drivingsituations Five predefined driving actions were imitated thatis turning the steering wheel operating the shift lever eatingsmoking and using a cell phone

There are five steps involved in simulating the drivingactivities by each participant

Step 1 A driver first grasps the steering wheel and slightlyturns the steering wheel

Step 2 The driverrsquos right hand moves to shift the lever andoperates it for several times before moving back to thesteering wheel

Step 3 The driver takes a cell phone from the dashboard andresponds to a phone call and then puts it back by his or herright hand

Step 4 Thedriver takes a cookie from the dashboard and eatsit using his or her right hand

Step 5 For male drivers he takes a cigarette from thedashboard and puts it into his mouth and then uses a lighterto light the cigarette and then puts it back on the dashboard

This experiment extracted twenty consecutive picturesequences from the video of the dataset for further experi-mentation

Frame number0 1000 2000 3000 4000 5000 6000

12000

10000

8000

2000

4000

6000

0

Are

a of d

iffer

ence

poi

nt co

mpa

red

to p

revi

ous f

ram

e

Figure 1 The vertical axis stands for area of difference point com-pared to previous frame by applying Otsursquos thresholding method[26] the horizontal axis stands for the frame number

21 Action Detection and Segmentation Being similar tomany intelligent video analysis systems action recognitionshould start with motion detection in a continuous videostream for which many approaches are available Among thepopular approaches frame differencing is the simplest andmost efficient method which involves taking the differencebetween two frames to detect the object Frame differencingis widely appliedwith proven performance particularly whena fixed camera is used to observe dynamic events in a scene

With each driving action sequence the frame differ-ences between two adjacent image frames are first calcu-lated followed by thresholding operation to identify movingobjects Otsursquos thresholding method [26] was chosen whichminimizes the intraclass variance of the black and whitepixels The existence of moving objects will be determinedby evaluating whether there exist connected regions in thebinary image And the location of the moving objects can befurther calculated based on the total areas and coordinate ofconnected regions The details can be illustrated by Figure 1

In the following section the segmented actions imagesare further manually labelled into four different categories ofaction primitives based on the trajectory of driverrsquos right handas shown in Figure 2 The first category of action is movingto shift lever with the right hand from the steering wheel ormoving back to the steering wheel from the shift lever Thesecond category of action is operating the shift lever withthe right hand The third category of action is moving to thedashboard from the steering wheel with the right hand ormoving back to the steering wheel from the dashboard withthe right handThe fourth category of action is moving to thehead from the steering wheel or moving back to the steeringwheel from the head with the right hand

3 Motion History Image (MHI)

Motion history image (MHI) approach is a view-basedtemporal template approach developed by Bobick and Davis[23] which is simple but robust in the representation ofmovements and is widely employed in action recognition

4 International Journal of Vehicular Technology

motion analysis and other related applications [36ndash38] Themotion history image (MHI) can describe how the motion ismoving in the image sequence Another representation calledmotion energy image (MEI) can demonstrate the presence ofany motion or a spatial pattern in the image sequence BothMHI and MEI templates comprise the motion history image(MHI) template-matching method

Figure 3 shows a movement in driving The first row issome key frames in a driving action The second and thirdrows are the frame differences and the corresponding binaryimages from applying Otsursquos thresholding The fourth andfifth rows are cumulative MEI and MHI images respectivelyMHIrsquos pixel intensity is a function of the recency of motion ina sequence where brighter values correspond to more recentmotion We currently use a simple replacement and lineardecay operator using the binary image difference framesTheformal definitions are briefly explained below

diff (119909 119910 119905) = zeros (119909 119910 1) if 119905 = 11003816100381610038161003816119868 (119909 119910 119905 minus 1) minus 119868 (119909 119910 119905)

1003816100381610038161003816 otherwise(1)

where diff (119909 119910 119905) is a difference image sequence indicatingthe difference compared to previous frame Let

119863(119909 119910 119905) = 0 if diff (119909 119910 119905) lt threshold1 otherwise

(2)

where 119863(119909 119910 119905) is binary images sequence indicating regionof motion Then the motion energy image is defined as

MEI (119909 119910 119905) =119905 = action end frame

119905 = action start frame119863(119909 119910 119905) (3)

Both motion history images and motion energy imageswere introduced to capture motion information in images[23] While MEI only indicates where the motion is motionhistory image MHI (119909 119910 119905) represents the way the objectmoving which can be defined as

MHI =

255 if 119863(119909 119910 119905) = 1

max 0MHI (119909 119910 119905 minus 1) minus 1 if 119863(119909 119910 119905) = 1255

picseqlengthle 1

max0MHI (119909 119910 119905 minus 1) minus floor ( 255

pic seq length) if 119863(119909 119910 119905) = 1

255

picseqlengthgt 1

(4)

The result is a scalar-valued image where latest movingpixels are the brightest MHI can represent the location theshape and the movement direction of an action in a picturesequence As MEI can be obtained by thresholding the MHIabove zero we will only consider features derived fromMHIin the following

After the driving actions were detected and segmentedfrom the raw video dataset motion history images wereextracted for each of the four decomposed action setsFigure 4 demonstrates how the motion history image iscalculated to represent movements for each decomposedaction sequence In the figure the left column and themiddlecolumn are the start and end frames of a decomposed actionsnippet respectively The right column is the MHI calculatedfor the corresponding action snippet

4 Pyramid Histogram of OrientedGradients (PHOG)

Motion history image MHI is not appropriate to be directlyexploited as features for the purpose of comparison orclassification in practical applications In the basic MHImethod [23] after calculating the MHI and MEI featurevectors are calculated employing the seven high-order HumomentsThen these feature vectors are used for recognitionHowever Hursquos moment invariants have some drawbacks

particularly limited recognition power [39] In this paper thehistogram of oriented gradients feature is extracted from theMHI as the suitable features for classification

In many image processing tasks the local geometricalshapes within an image can be characterized by the dis-tribution of edge directions called histograms of orientedgradients (HOG) [35] HOG can be calculated by evaluating adense grid of well-normalized local histograms of image gra-dient orientations over the image windows HOG has someimportant advantages over other local shape descriptors forexample it is invariant to small deformations and robust interms of outliers and noise

The HOG feature encodes the gradient orientation ofone image patch without considering where this orientationoriginates from in this patch Therefore it is not discrimi-native enough when the spatial property of the underlyingstructure of the image patch is important The objective ofa newly proposed improved descriptor pyramid histogramof oriented gradients (PHOG) [34] is to take the spatialproperty of the local shape into account while representingan image by HOG The spatial information is represented bytiling the image into regions at multiple resolutions based onspatial pyramid matching [40] Each image is divided intoa sequence of increasingly finer spatial grids by repeatedlydoubling the number of divisions in each axis directionThe number of points in each grid cell is then recorded

International Journal of Vehicular Technology 5

Hand interaction with shift lever

Frame 640 650 660

Frame 1150 1159 1170

(a)

Hand operating the shift lever

Frame 794 830815

Frame 701 705 709

(b)

Hand interaction with head

Frame 3318 3327 3340

Frame 2792 2801 2811

(c)

Hand interaction with dashboardFrame 4617 4631 4638

Frame 4109 4118 4134

(d)

Figure 2 Four manually decomposed action primitives

The number of points in a cell at one level is simply the sumover those contained in the four cells it is divided into at thenext level thus forming a pyramid representation The cellcounts at each level of resolution are the bin counts for thehistogram representing that level The soft correspondencebetween the two point sets can then be computed as aweighted sum over the histogram intersections at each level

The resolution of an MHI image is 640 times 480 An MHIis divided into small spatial cells based on different pyramidlevels We follow the practice in [34] by limiting the numberof levels to 119871 = 3 to prevent overfitting Figure 5 shows thatthe pyramid at level 119897 has 2119899 times 2119899 cells

The magnitude 119898(119909 119910) and orientation 120579(119909 119910) of thegradient on a pixel (119909 119910) are calculated as follows

(119909 119910) = radic119892119909(119909 119910)

2

+ 119910119910(119909 119910)

2

120579 (119909 119910) = arctan119892119909(119909 119910)

119892119910(119909 119910)

(5)

where 119892119909(119909 119910) and 119892

119910(119909 119910) are image gradients along the 119909

and 119910 directions Each gradient orientation is quantized into119870 bins In each cell of every level gradients over all the pixelsare concatenated to form a local119870 bins histogram As a resulta ROI at level 119897 is represented as a 11987021198972119897 dimension vector

All the cells at different pyramid levels are combined to forma final PHOG vector with dimension of 119889 = 119870sum

119871

119897=04119897 to

represent the whole ROIThe dimension of the PHOG feature (eg 119889 = 680 when

119870 = 8 119871 = 3) is relatively high Many dimension reductionmethods can be applied to alleviate the problem We employthe widely used principal component analysis (PCA) [41] dueto its simplicity and effectiveness

5 Random Forest (RF) and OtherClassification Algorithms

Random forest (RF) [42] is an ensemble classifier usingmanydecision tree models which can be used for classification orregression A special advantage of RF is that the accuracy andvariable importance information is provided with the resultsRandom forests create a number of classification trees Whenan vector representing a new object is input for classificationit was sent to every tree in the forest A different subset of thetraining data are selected (asymp23) with replacement to traineach tree and remaining training data are used to estimateerror and variable importance Class assignment is made bythe number of votes from all of the trees

RF has only two hyperparameters the number of vari-ables119872 in the random subset at each node and the number

6 International Journal of Vehicular Technology

Frame 640 644 648 652 656

I(x y t)

diff(x y t)

D(x y t)

MEI(x y t)

MHI(x y t)

Figure 3 Example of the driverrsquos right hand moving to shift lever from steering wheel The first row is some key frames in a driving actionThe second row is the corresponding frame difference images The third row is binary images resulted from thresholding The forth row iscumulative motion energy images The fifth row is cumulative motion history images

of trees 119879 in the forest [42] Breimarsquos RF error rate dependson two parameters the correlation between any pair oftrees and the strength of each individual tree in the forestIncreasing the correlation increases the forest error rate whileincreasing the strength of the individual trees decreasesthis misclassification rate Reducing 119872 reduces both thecorrelation and the strength119872 is often set to the square rootof the number of inputs

When the training set for a particular tree is drawn bysampling with replacement about one-third of the cases areleft out of the sample set

The RF algorithm can be summarised as follows

(1) Choose parameter 119879 which is the number of trees togrow

(2) Choose parameter119898 which is used to split each nodeand119898 = 119872 where119872 is the number of input variablesand119898 is held constant while growing the forest

(3) Grow 119879 trees When growing each tree do the follow-ing

(i) Construct a bootstrap sample of size 119899 sampledfrom 119878

119899= (119883119894 119910119894) (119894 = 1)

119899 with replacementand grow a tree from this bootstrap sample

(ii) When growing a tree at each node select 119898variables at random and use them to find thebest split

(iii) Grow the tree to a maximal extent There is nopruning

(4) To classify point119883 collect votes from every tree in theforest and then use majority voting to decide on theclass label

In this paper we also compared the accuracy of RF andseveral popular classification methods including 119896-nearestneighbor (119896NN) classifier multilayer perceptron (MLP) andSupport Vector Machines (SVM) on the driving actiondatasets

51 Other Classification Methods

511 119896minusNearest Neighbor Classifier 119896-nearest neighbour(119896NN) classifier one of themost classic and simplest classifierin machine learning classifies object based on the minimaldistance to training examples in feature space by a majorityvote of its neighbours [41] As a type of lazy learning119896NN classifier does not do any distance computation orcomparison until the test data is given Specifically the objectis assigned to the most common class among its 119896 nearestneighbours For example the object is classified as the classof its nearest neighbour if 119896 equals 1 Theoretically the errorrate of 119896NN algorithm is infinitely close to Bayes errorwhile the training set size is infinity However a satisfactoryperformance of 119896NN algorithm prefers a large number oftraining data set which results in expensive computation inpractical

512 Multilayer Perceptron Classifier In neural networkmultilayer perceptron (MLP) is an extension of the single

International Journal of Vehicular Technology 7

650Frame 640

middot middot middot

(a)

1170Frame 1150

middot middot middot

(b)

709Frame 701

middot middot middot

(c)

830Frame 794

middot middot middot

(d)

3340Frame 3318

middot middot middot

(e)

2811Frame 2792

middot middot middot

(f)

4134Frame 4109

middot middot middot

(g)

4638Frame 4617

middot middot middot

(h)Figure 4 MHIs for different driving actions (a) Right hand moving to shift lever (b) Right hand moving back to steering wheel from shiftlever (c) Right hand operating the shift lever (d) Operating the shift lever (e) Right handmoving to head from steering wheel (f) Right handmoving back to steering wheel (g) Right moving back to steering wheel from dashboard (h) Right hand moving to dashboard from steeringwheel

Level = 0

(a)

Level = 1

(b)

Level = 2

(c)Figure 5 A schematic illustration of PHOG At each resolution level PHOG consists of a histogram of orientation gradients over each imagesubregion

layer linear perceptron by adding hidden layers in between[41] AnMLP is a feedforward artificial neural networkmodelthat maps sets of input data onto a set of appropriate outputsAnMLP consists of multiple layers of nodes that is the inputlayer single or multiple hidden layer and an output layer AnMLP classifier is usually trained by the error backpropagationalgorithm

513 Support Vector Machine Support vector machine(SVM) is one of the most commonly applied supervisedlearning algorithms A SVM is formally defined by a sepa-rating hyperplane which is in a high or infinite dimensional

space Given labeled training data SVM will generate anoptimal hyperplane to categorize new examples Intuitivelythe operation of the SVM algorithm is based on finding thehyperplane that gives the largest minimum distance to thetraining examples And the optimal separating hyperplanemaximizes the margin of the training data

6 Experiments

61 Holdout Experiment We choose the two standard exper-imental procedures namely holdout approach and the cross-validation approach to verify the driving action recognition

8 International Journal of Vehicular Technology

kNN RF SVM MLP

Accuracy

88019656 9443

9093

0

01

02

03

04

05

06

07

08

09

1

Classifiers

Figure 6 Bar plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

performance using RF classifier and the PHOG featureextracted from MHI Other three classifiers 119896NN MLP andSVM will be compared

In the holdout experiment 20 of the PHOG featuresare randomly selected as testing dataset while the remaining80 of the features are used as training dataset The holdoutexperiment is usually repeated 100 times and the classificationresults are recorded In each holdout experiment cycle thesame training and testing dataset are applied to the fourdifferent classifiers simultaneously to compare their perfor-mance

Generally classification accuracy is one of the mostcommon indicators used to evaluate the performance of theclassification Figures 6 and 7 are the bar plots and box plots ofthe classification accuracies from the four classifiers with thesame decomposed driving actionsThe results are the averagefrom 100 runs The average classification accuracies of 119896NNclassifier RF classifier SVM classifier and MLP classifierare 8801 9656 9443 and 9093 respectively It isobvious that the RF classifier performs the best among thefour classifiers compared

To further evaluate the performance of RF classifierconfusionmatrix is used to visualize the discrepancy betweenthe actual class labels and predicted results from the clas-sification Confusion matrix gives the full picture at theerrors made by a classification model The confusion matrixshows how the predictions are made by the model The rowscorrespond to the known class of the data that is the labelsin the dataThe columns correspond to the predictions madeby themodelThe value of each of element in thematrix is thenumber of predictions made with the class corresponding tothe column for example with the correct value as representedby the row Thus the diagonal elements show the numberof correct classifications made for each class and the off-diagonal elements show the errors made Figure 8 showsthe confusion matrix from the above experiment for the RFclassifier

1

098

096

094

092

09

088

086

084

082

Accuracy +

++

RF SVM MLPClassifiers

kNN

Figure 7 Box plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09570 00200 00100 00130

00287 09713 0 0

000038 0 09462 00500

0 0 00123 09877

Figure 8 Confusion matrix of RF classification result from theholdout experiment

In the figure classes labelled as one two three and fourcorrespond to hand interaction with shift lever operatingthe shift lever interaction with head and interaction withdashboard respectively In the confusionmatrix the columnsare the predicted classes while the rows are the true onesFor the RF classifier the average classification rate of thefour driving actions is 9656 The respective classificationaccuracies for the four driving actions are 957 97139462 and 9877 in holdout experiment respectively Itshows that the classes one and two tend to be easily confusedwith each other with error rate of about 2 and 287respectively On the other hand the error rates from theconfusion between classes three and four lie between 12and 5

62 119896-Ford Cross-Validation The second part of our experi-ment is to use 119896-ford cross-validation to further confirm theclassification performance of the driving actions In 119896-foldcross-validation the original sets of data will be portionedinto 119896 subsets randomly One subset is retained as thevalidation data for testing while the remaining 119896 minus 1 subsetsare used as training data The cross-validation process willthen be repeated 119896 times which means that each of the 119896

International Journal of Vehicular Technology 9

RF SVM MLP

Accuracy

9464983 9739 9591

0

01

02

03

04

05

06

07

08

09

1

ClassifierskNN

Figure 9 Bar plots of classification rates from 10-fold cross-validation

1

095

09

085

08

075

RF SVM MLPClassifiers

Accuracy

+

+++

++

kNN

Figure 10 Box plots of classification rates from 10-fold cross-validation

subsamples will be used exactly once as the validation dataThe estimation can be the average of the 119896 results The keyproperty of this method is that all observations are used forboth training and validation and each observation is used forvalidation exactly once We chose 10-fold cross-validation inthe experiment whichmeans that nine of the ten splitted setsare used for training and the remaining one is reserved fortesting

The evaluation procedure is similar to the holdout exper-iment The cross-validation experiment was also conducted100 times for each of the classificationmethods Each time thePHOG feature extracted from the driving action dataset wasrandomly divided into 10 folders The average classificationaccuracies of the 100 repetitions are shown in the bar plots ofFigure 9 and box plots of Figure 10The average classificationaccuracies of 119896-NN classifier RF classifier SVM classifierand MLP classifier are 9464 9830 9739 and 9591respectively From the bar plots box plots and confusionmatrix in Figures 9 10 and 11 the RF classifier clearlyoutperforms other three classifiers compared

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09657 00237 00004 00102

00292 09708 0 0

00041 0 09483 00476

0 0 00147 09853

Figure 11 Confusion matrix of RF classification from 10-fold cross-validation experiment

7 Conclusion

In this paper we proposed an efficient approach to recognisedriving action primitives by joint application of motion his-tory image and pyramid histogram of oriented gradientsTheproposed driving action primitives lead to the hierarchicalrepresentation of driver activities The manually labelledaction primitives are jointly represented by motion historyimage and pyramid histogram of oriented gradient (PHOG)The random forest classifier was exploited to evaluate theclassification performance which gives an accuracy of over94 from the holdout and cross-validation experimentsThis compares favorably over some other commonly usedclassifications methods

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The project is supported by Suzhou Municipal Science andTechnology Foundation Grants SS201109 and SYG201140

References

[1] ldquoWHO World report on road traffic injury preventionrdquohttpwwwwhointviolence injury preventionpublicationsroad trafficworld reporten

[2] T B Moeslund A Hilton and V Kruger ldquoA survey of advancesin vision-based humanmotion capture and analysisrdquo ComputerVision and Image Understanding vol 104 no 2-3 pp 90ndash1262006

[3] X Fan B-C Yin and Y-F Sun ldquoYawning detection for mon-itoring driver fatiguerdquo in Proceedings of the 6th InternationalConference on Machine Learning and Cybernetics (ICMLC rsquo07)pp 664ndash668 Hong Kong August 2007

[4] R Grace V E Byrne D M Bierman et al ldquoA drowsy driverdetection system for heavy vehiclesrdquo in Proceedings of the 17thDigital Avionics Systems Conference vol 2 pp I361ndashI368Bellevue Wash USA 1998

10 International Journal of Vehicular Technology

[5] E Wahlstrom O Masoud and N Papanikolopoulos ldquoVision-based methods for driver monitoringrdquo in Proceedings of theIEEE Intelligent Transportation Systems vol 2 pp 903ndash908Shanghai China October 2003

[6] A Doshi and M M Trivedi ldquoOn the roles of eye gaze and headdynamics in predicting driverrsquos intent to change lanesrdquo IEEETransactions on Intelligent Transportation Systems vol 10 no3 pp 453ndash462 2009

[7] J J Jo S J Lee H G Jung K R Park and J J Kim ldquoVision-basedmethod for detecting driver drowsiness and distraction indriver monitoringsystemrdquo Optical Engineering vol 50 no 12pp 127202ndash127224 2011

[8] X Liu Y D Zhu and K Fujimura ldquoReal-time pose classicationfor driver monitoringrdquo in Proceedings of the IEEE IntelligentTransportation Systems pp 174ndash178 Singapore September2002

[9] P Watta S Lakshmanan and Y Hou ldquoNonparametricapproaches for estimating driver poserdquo IEEE Transactionson Vehicular Technology vol 56 no 4 pp 2028ndash2041 2007

[10] T Kato T Fujii andM Tanimoto ldquoDetection of driverrsquos posturein the car by using far infrared camerardquo in Proceedings of theIEEE Intelligent Vehicles Symposium pp 339ndash344 Parma ItalyJune 2004

[11] S Y Cheng S Park and M M Trivedi ldquoMulti-spectral andmulti-perspective video arrays for driver body tracking andactivity analysisrdquo Computer Vision and Image Understandingvol 106 no 2-3 pp 245ndash257 2007

[12] H Veeraraghavan N Bird S Atev and N Papanikolopou-los ldquoClassifiers for driver activity monitoringrdquo TransportationResearch C vol 15 no 1 pp 51ndash67 2007

[13] H Veeraraghavan S Atev N Bird P Schrater and NPapanikolopoulos ldquoDriver activity monitoring through super-vised and unsupervised learningrdquo in Proceedings of the 8thInternational IEEE Conference on Intelligent TransportationSystems pp 895ndash900 Vienna Austria September 2005

[14] C H Zhao B L Zhang J He and J Lian ldquoRecognition ofdriving postures by contourlet transform and random forestsrdquoIET Intelligent Transport Systems vol 6 no 2 pp 161ndash168 2012

[15] C H Zhao B L Zhang X Z Zhang S Q Zhao and H XLi ldquoRecognition of driving postures by combined features andrandom subspace ensemble of multilayer perception classifferrdquoJournal of Neural Computing and Applications vol 22 no 1Supplement pp 175ndash184 2000

[16] C Tran A Doshi andMM Trivedi ldquoModeling and predictionof driver behavior by foot gesture analysisrdquo Computer Visionand Image Understanding vol 116 no 3 pp 435ndash445 2012

[17] J C McCall and M M Trivedi ldquoVisual context capture andanalysis for driver attention monitoringrdquo in Proceedings of the7th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo04) pp 332ndash337 Washington DC USA Octo-ber 2004

[18] K Torkkola N Massey and C Wood ldquoDriver inattentiondetection through intelligent analysis of readily available sen-sorsrdquo in Proceedings of the 7th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo04) pp 326ndash331Washington DC USA October 2004

[19] D A Johnson and M M Trivedi ldquoDriving style recognitionusing a smartphone as a sensor platformrdquo in Proceedings ofthe 14th IEEE International Intelligent Transportation SystemsConference (ITSC rsquo11) pp 1609ndash1615 Washington DC USAOctober 2011

[20] A V Desai and M A Haque ldquoVigilance monitoring foroperator safety a simulation study on highway drivingrdquo Journalof Safety Research vol 37 no 2 pp 139ndash147 2006

[21] G Rigas Y Goletsis P Bougia and D I Fotiadis ldquoTowardsdriverrsquos state recognition on real driving conditionsrdquo Interna-tional Journal of Vehicular Technology vol 2011 Article ID617210 14 pages 2011

[22] M H Kutila M Jokela T Makinen J Viitanen G MarkkulaandTWVictor ldquoDriver cognitive distraction detection featureestimation and implementationrdquoProceedings of the Institution ofMechanical Engineers D vol 221 no 9 pp 1027ndash1040 2007

[23] A F Bobick and J W Davis ldquoThe recognition of humanmovement using temporal templatesrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 23 no 3 pp 257ndash267 2001

[24] Y Wang K Huang and T Tan ldquoHuman activity recognitionbased on R transformrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash8Minneapolis Minn USA June 2007

[25] H Chen H Chen Y Chen and S Lee ldquoHuman actionrecognition using star skeletonrdquo in Proceedings of the Inter-national Workshop on Video Surveillance and Sensor Networks(VSSN rsquo06) pp 171ndash178 Santa Barbara Calif USA October2006

[26] W Liang and D Suter ldquoInformative shape representationsfor human action recognitionrdquo in Proceedings of the 18thInternational Conference on Pattern Recognition (ICPR rsquo06) pp1266ndash1269 Hong Kong August 2006

[27] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the International Con-ference on Computer Vision vol 2 pp 726ndash733 Nice FranceOctober 2003

[28] A R Ahad T Ogata J K Tan H S Kim and S IshikawaldquoMotion recognition approach to solve overwriting in complexactionsrdquo in Proceedings of the 8th IEEE International Conferenceon Automatic Face and Gesture Recognition (FG rsquo08) pp 1ndash6Amsterdam The Netherlands September 2008

[29] S Ali and M Shah ldquoHuman action recognition in videosusing kinematic features and multiple instance learningrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol32 no 2 pp 288ndash303 2010

[30] S Danafar and N Gheissari ldquoAction recognition for surveil-lanceapplications using optic ow and SVMrdquo in Proceedings ofthe Asian Conferenceon Computer Vision (ACCV rsquo07) pp 457ndash466 Tokyo Japan November 2007

[31] I Laptev and T Lindeberg ldquoSpace-time interest pointsrdquo in Pro-ceedings of the 9th IEEE International Conference on ComputerVision pp 432ndash439 Nice France October 2003

[32] I Laptev B Caputo C Schuldt and T Lindeberg ldquoLocalvelocity-adapted motion events for spatio-temporal recogni-tionrdquo Computer Vision and Image Understanding vol 108 no3 pp 207ndash229 2007

[33] S Park and M Trivedi ldquoDriver activity analysis for intelligentvehicles issues and development frameworkrdquo in Proceedings ofthe IEEE Intelligent Vehicles Symposium pp 644ndash649 LasVegasNev USA June 2005

[34] A Bosch A Zisserman and X Munoz ldquoRepresenting shapewith a spatial pyramid kernelrdquo in Proceedings of the 6thACM International Conference on Image and Video Retrieval(CIVR rsquo07) pp 401ndash408 Amsterdam The Netherlands July2007

International Journal of Vehicular Technology 11

[35] N Dalal and B Triggs ldquoHistograms of oriented gradientsfor human detectionrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR rsquo05) pp 886ndash893 San Diego Calif USA June 2005

[36] G R Bradski and J Davis ldquoMotion segmentation and poserecognitionwithmotion history gradientsrdquo inProceedings of the5th IEEEWorkshop onApplications of Computer Vision pp 238ndash244 Palm Springs Calif USA 2000

[37] O Masoud and N Papanikolopoulos ldquoA method for humanaction recognitionrdquo Image and Vision Computing vol 21 no 8pp 729ndash743 2003

[38] H Yi D Rajan and L-T Chia ldquoA new motion histogram toindex motion content in video segmentsrdquo Pattern RecognitionLetters vol 26 no 9 pp 1221ndash1231 2005

[39] J Flusser T Suk and B ZitovMoments andMoment Invariantsin Pattern Recognition John Wiley amp Sons Chichester UK2009

[40] S Lazebnik C Schmid and J Ponce ldquoBeyond bags of featuresspatial pyramid matching for recognizing natural scene cate-goriesrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo06) pp2169ndash2178 New York NY USA June 2006

[41] R O Duda P E Hart and D G Stork Pattern ClassiffcationWiley-Interscience 2nd edition 2000

[42] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

4 International Journal of Vehicular Technology

motion analysis and other related applications [36ndash38] Themotion history image (MHI) can describe how the motion ismoving in the image sequence Another representation calledmotion energy image (MEI) can demonstrate the presence ofany motion or a spatial pattern in the image sequence BothMHI and MEI templates comprise the motion history image(MHI) template-matching method

Figure 3 shows a movement in driving The first row issome key frames in a driving action The second and thirdrows are the frame differences and the corresponding binaryimages from applying Otsursquos thresholding The fourth andfifth rows are cumulative MEI and MHI images respectivelyMHIrsquos pixel intensity is a function of the recency of motion ina sequence where brighter values correspond to more recentmotion We currently use a simple replacement and lineardecay operator using the binary image difference framesTheformal definitions are briefly explained below

diff (119909 119910 119905) = zeros (119909 119910 1) if 119905 = 11003816100381610038161003816119868 (119909 119910 119905 minus 1) minus 119868 (119909 119910 119905)

1003816100381610038161003816 otherwise(1)

where diff (119909 119910 119905) is a difference image sequence indicatingthe difference compared to previous frame Let

119863(119909 119910 119905) = 0 if diff (119909 119910 119905) lt threshold1 otherwise

(2)

where 119863(119909 119910 119905) is binary images sequence indicating regionof motion Then the motion energy image is defined as

MEI (119909 119910 119905) =119905 = action end frame

119905 = action start frame119863(119909 119910 119905) (3)

Both motion history images and motion energy imageswere introduced to capture motion information in images[23] While MEI only indicates where the motion is motionhistory image MHI (119909 119910 119905) represents the way the objectmoving which can be defined as

MHI =

255 if 119863(119909 119910 119905) = 1

max 0MHI (119909 119910 119905 minus 1) minus 1 if 119863(119909 119910 119905) = 1255

picseqlengthle 1

max0MHI (119909 119910 119905 minus 1) minus floor ( 255

pic seq length) if 119863(119909 119910 119905) = 1

255

picseqlengthgt 1

(4)

The result is a scalar-valued image where latest movingpixels are the brightest MHI can represent the location theshape and the movement direction of an action in a picturesequence As MEI can be obtained by thresholding the MHIabove zero we will only consider features derived fromMHIin the following

After the driving actions were detected and segmentedfrom the raw video dataset motion history images wereextracted for each of the four decomposed action setsFigure 4 demonstrates how the motion history image iscalculated to represent movements for each decomposedaction sequence In the figure the left column and themiddlecolumn are the start and end frames of a decomposed actionsnippet respectively The right column is the MHI calculatedfor the corresponding action snippet

4 Pyramid Histogram of OrientedGradients (PHOG)

Motion history image MHI is not appropriate to be directlyexploited as features for the purpose of comparison orclassification in practical applications In the basic MHImethod [23] after calculating the MHI and MEI featurevectors are calculated employing the seven high-order HumomentsThen these feature vectors are used for recognitionHowever Hursquos moment invariants have some drawbacks

particularly limited recognition power [39] In this paper thehistogram of oriented gradients feature is extracted from theMHI as the suitable features for classification

In many image processing tasks the local geometricalshapes within an image can be characterized by the dis-tribution of edge directions called histograms of orientedgradients (HOG) [35] HOG can be calculated by evaluating adense grid of well-normalized local histograms of image gra-dient orientations over the image windows HOG has someimportant advantages over other local shape descriptors forexample it is invariant to small deformations and robust interms of outliers and noise

The HOG feature encodes the gradient orientation ofone image patch without considering where this orientationoriginates from in this patch Therefore it is not discrimi-native enough when the spatial property of the underlyingstructure of the image patch is important The objective ofa newly proposed improved descriptor pyramid histogramof oriented gradients (PHOG) [34] is to take the spatialproperty of the local shape into account while representingan image by HOG The spatial information is represented bytiling the image into regions at multiple resolutions based onspatial pyramid matching [40] Each image is divided intoa sequence of increasingly finer spatial grids by repeatedlydoubling the number of divisions in each axis directionThe number of points in each grid cell is then recorded

International Journal of Vehicular Technology 5

Hand interaction with shift lever

Frame 640 650 660

Frame 1150 1159 1170

(a)

Hand operating the shift lever

Frame 794 830815

Frame 701 705 709

(b)

Hand interaction with head

Frame 3318 3327 3340

Frame 2792 2801 2811

(c)

Hand interaction with dashboardFrame 4617 4631 4638

Frame 4109 4118 4134

(d)

Figure 2 Four manually decomposed action primitives

The number of points in a cell at one level is simply the sumover those contained in the four cells it is divided into at thenext level thus forming a pyramid representation The cellcounts at each level of resolution are the bin counts for thehistogram representing that level The soft correspondencebetween the two point sets can then be computed as aweighted sum over the histogram intersections at each level

The resolution of an MHI image is 640 times 480 An MHIis divided into small spatial cells based on different pyramidlevels We follow the practice in [34] by limiting the numberof levels to 119871 = 3 to prevent overfitting Figure 5 shows thatthe pyramid at level 119897 has 2119899 times 2119899 cells

The magnitude 119898(119909 119910) and orientation 120579(119909 119910) of thegradient on a pixel (119909 119910) are calculated as follows

(119909 119910) = radic119892119909(119909 119910)

2

+ 119910119910(119909 119910)

2

120579 (119909 119910) = arctan119892119909(119909 119910)

119892119910(119909 119910)

(5)

where 119892119909(119909 119910) and 119892

119910(119909 119910) are image gradients along the 119909

and 119910 directions Each gradient orientation is quantized into119870 bins In each cell of every level gradients over all the pixelsare concatenated to form a local119870 bins histogram As a resulta ROI at level 119897 is represented as a 11987021198972119897 dimension vector

All the cells at different pyramid levels are combined to forma final PHOG vector with dimension of 119889 = 119870sum

119871

119897=04119897 to

represent the whole ROIThe dimension of the PHOG feature (eg 119889 = 680 when

119870 = 8 119871 = 3) is relatively high Many dimension reductionmethods can be applied to alleviate the problem We employthe widely used principal component analysis (PCA) [41] dueto its simplicity and effectiveness

5 Random Forest (RF) and OtherClassification Algorithms

Random forest (RF) [42] is an ensemble classifier usingmanydecision tree models which can be used for classification orregression A special advantage of RF is that the accuracy andvariable importance information is provided with the resultsRandom forests create a number of classification trees Whenan vector representing a new object is input for classificationit was sent to every tree in the forest A different subset of thetraining data are selected (asymp23) with replacement to traineach tree and remaining training data are used to estimateerror and variable importance Class assignment is made bythe number of votes from all of the trees

RF has only two hyperparameters the number of vari-ables119872 in the random subset at each node and the number

6 International Journal of Vehicular Technology

Frame 640 644 648 652 656

I(x y t)

diff(x y t)

D(x y t)

MEI(x y t)

MHI(x y t)

Figure 3 Example of the driverrsquos right hand moving to shift lever from steering wheel The first row is some key frames in a driving actionThe second row is the corresponding frame difference images The third row is binary images resulted from thresholding The forth row iscumulative motion energy images The fifth row is cumulative motion history images

of trees 119879 in the forest [42] Breimarsquos RF error rate dependson two parameters the correlation between any pair oftrees and the strength of each individual tree in the forestIncreasing the correlation increases the forest error rate whileincreasing the strength of the individual trees decreasesthis misclassification rate Reducing 119872 reduces both thecorrelation and the strength119872 is often set to the square rootof the number of inputs

When the training set for a particular tree is drawn bysampling with replacement about one-third of the cases areleft out of the sample set

The RF algorithm can be summarised as follows

(1) Choose parameter 119879 which is the number of trees togrow

(2) Choose parameter119898 which is used to split each nodeand119898 = 119872 where119872 is the number of input variablesand119898 is held constant while growing the forest

(3) Grow 119879 trees When growing each tree do the follow-ing

(i) Construct a bootstrap sample of size 119899 sampledfrom 119878

119899= (119883119894 119910119894) (119894 = 1)

119899 with replacementand grow a tree from this bootstrap sample

(ii) When growing a tree at each node select 119898variables at random and use them to find thebest split

(iii) Grow the tree to a maximal extent There is nopruning

(4) To classify point119883 collect votes from every tree in theforest and then use majority voting to decide on theclass label

In this paper we also compared the accuracy of RF andseveral popular classification methods including 119896-nearestneighbor (119896NN) classifier multilayer perceptron (MLP) andSupport Vector Machines (SVM) on the driving actiondatasets

51 Other Classification Methods

511 119896minusNearest Neighbor Classifier 119896-nearest neighbour(119896NN) classifier one of themost classic and simplest classifierin machine learning classifies object based on the minimaldistance to training examples in feature space by a majorityvote of its neighbours [41] As a type of lazy learning119896NN classifier does not do any distance computation orcomparison until the test data is given Specifically the objectis assigned to the most common class among its 119896 nearestneighbours For example the object is classified as the classof its nearest neighbour if 119896 equals 1 Theoretically the errorrate of 119896NN algorithm is infinitely close to Bayes errorwhile the training set size is infinity However a satisfactoryperformance of 119896NN algorithm prefers a large number oftraining data set which results in expensive computation inpractical

512 Multilayer Perceptron Classifier In neural networkmultilayer perceptron (MLP) is an extension of the single

International Journal of Vehicular Technology 7

650Frame 640

middot middot middot

(a)

1170Frame 1150

middot middot middot

(b)

709Frame 701

middot middot middot

(c)

830Frame 794

middot middot middot

(d)

3340Frame 3318

middot middot middot

(e)

2811Frame 2792

middot middot middot

(f)

4134Frame 4109

middot middot middot

(g)

4638Frame 4617

middot middot middot

(h)Figure 4 MHIs for different driving actions (a) Right hand moving to shift lever (b) Right hand moving back to steering wheel from shiftlever (c) Right hand operating the shift lever (d) Operating the shift lever (e) Right handmoving to head from steering wheel (f) Right handmoving back to steering wheel (g) Right moving back to steering wheel from dashboard (h) Right hand moving to dashboard from steeringwheel

Level = 0

(a)

Level = 1

(b)

Level = 2

(c)Figure 5 A schematic illustration of PHOG At each resolution level PHOG consists of a histogram of orientation gradients over each imagesubregion

layer linear perceptron by adding hidden layers in between[41] AnMLP is a feedforward artificial neural networkmodelthat maps sets of input data onto a set of appropriate outputsAnMLP consists of multiple layers of nodes that is the inputlayer single or multiple hidden layer and an output layer AnMLP classifier is usually trained by the error backpropagationalgorithm

513 Support Vector Machine Support vector machine(SVM) is one of the most commonly applied supervisedlearning algorithms A SVM is formally defined by a sepa-rating hyperplane which is in a high or infinite dimensional

space Given labeled training data SVM will generate anoptimal hyperplane to categorize new examples Intuitivelythe operation of the SVM algorithm is based on finding thehyperplane that gives the largest minimum distance to thetraining examples And the optimal separating hyperplanemaximizes the margin of the training data

6 Experiments

61 Holdout Experiment We choose the two standard exper-imental procedures namely holdout approach and the cross-validation approach to verify the driving action recognition

8 International Journal of Vehicular Technology

kNN RF SVM MLP

Accuracy

88019656 9443

9093

0

01

02

03

04

05

06

07

08

09

1

Classifiers

Figure 6 Bar plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

performance using RF classifier and the PHOG featureextracted from MHI Other three classifiers 119896NN MLP andSVM will be compared

In the holdout experiment 20 of the PHOG featuresare randomly selected as testing dataset while the remaining80 of the features are used as training dataset The holdoutexperiment is usually repeated 100 times and the classificationresults are recorded In each holdout experiment cycle thesame training and testing dataset are applied to the fourdifferent classifiers simultaneously to compare their perfor-mance

Generally classification accuracy is one of the mostcommon indicators used to evaluate the performance of theclassification Figures 6 and 7 are the bar plots and box plots ofthe classification accuracies from the four classifiers with thesame decomposed driving actionsThe results are the averagefrom 100 runs The average classification accuracies of 119896NNclassifier RF classifier SVM classifier and MLP classifierare 8801 9656 9443 and 9093 respectively It isobvious that the RF classifier performs the best among thefour classifiers compared

To further evaluate the performance of RF classifierconfusionmatrix is used to visualize the discrepancy betweenthe actual class labels and predicted results from the clas-sification Confusion matrix gives the full picture at theerrors made by a classification model The confusion matrixshows how the predictions are made by the model The rowscorrespond to the known class of the data that is the labelsin the dataThe columns correspond to the predictions madeby themodelThe value of each of element in thematrix is thenumber of predictions made with the class corresponding tothe column for example with the correct value as representedby the row Thus the diagonal elements show the numberof correct classifications made for each class and the off-diagonal elements show the errors made Figure 8 showsthe confusion matrix from the above experiment for the RFclassifier

1

098

096

094

092

09

088

086

084

082

Accuracy +

++

RF SVM MLPClassifiers

kNN

Figure 7 Box plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09570 00200 00100 00130

00287 09713 0 0

000038 0 09462 00500

0 0 00123 09877

Figure 8 Confusion matrix of RF classification result from theholdout experiment

In the figure classes labelled as one two three and fourcorrespond to hand interaction with shift lever operatingthe shift lever interaction with head and interaction withdashboard respectively In the confusionmatrix the columnsare the predicted classes while the rows are the true onesFor the RF classifier the average classification rate of thefour driving actions is 9656 The respective classificationaccuracies for the four driving actions are 957 97139462 and 9877 in holdout experiment respectively Itshows that the classes one and two tend to be easily confusedwith each other with error rate of about 2 and 287respectively On the other hand the error rates from theconfusion between classes three and four lie between 12and 5

62 119896-Ford Cross-Validation The second part of our experi-ment is to use 119896-ford cross-validation to further confirm theclassification performance of the driving actions In 119896-foldcross-validation the original sets of data will be portionedinto 119896 subsets randomly One subset is retained as thevalidation data for testing while the remaining 119896 minus 1 subsetsare used as training data The cross-validation process willthen be repeated 119896 times which means that each of the 119896

International Journal of Vehicular Technology 9

RF SVM MLP

Accuracy

9464983 9739 9591

0

01

02

03

04

05

06

07

08

09

1

ClassifierskNN

Figure 9 Bar plots of classification rates from 10-fold cross-validation

1

095

09

085

08

075

RF SVM MLPClassifiers

Accuracy

+

+++

++

kNN

Figure 10 Box plots of classification rates from 10-fold cross-validation

subsamples will be used exactly once as the validation dataThe estimation can be the average of the 119896 results The keyproperty of this method is that all observations are used forboth training and validation and each observation is used forvalidation exactly once We chose 10-fold cross-validation inthe experiment whichmeans that nine of the ten splitted setsare used for training and the remaining one is reserved fortesting

The evaluation procedure is similar to the holdout exper-iment The cross-validation experiment was also conducted100 times for each of the classificationmethods Each time thePHOG feature extracted from the driving action dataset wasrandomly divided into 10 folders The average classificationaccuracies of the 100 repetitions are shown in the bar plots ofFigure 9 and box plots of Figure 10The average classificationaccuracies of 119896-NN classifier RF classifier SVM classifierand MLP classifier are 9464 9830 9739 and 9591respectively From the bar plots box plots and confusionmatrix in Figures 9 10 and 11 the RF classifier clearlyoutperforms other three classifiers compared

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09657 00237 00004 00102

00292 09708 0 0

00041 0 09483 00476

0 0 00147 09853

Figure 11 Confusion matrix of RF classification from 10-fold cross-validation experiment

7 Conclusion

In this paper we proposed an efficient approach to recognisedriving action primitives by joint application of motion his-tory image and pyramid histogram of oriented gradientsTheproposed driving action primitives lead to the hierarchicalrepresentation of driver activities The manually labelledaction primitives are jointly represented by motion historyimage and pyramid histogram of oriented gradient (PHOG)The random forest classifier was exploited to evaluate theclassification performance which gives an accuracy of over94 from the holdout and cross-validation experimentsThis compares favorably over some other commonly usedclassifications methods

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The project is supported by Suzhou Municipal Science andTechnology Foundation Grants SS201109 and SYG201140

References

[1] ldquoWHO World report on road traffic injury preventionrdquohttpwwwwhointviolence injury preventionpublicationsroad trafficworld reporten

[2] T B Moeslund A Hilton and V Kruger ldquoA survey of advancesin vision-based humanmotion capture and analysisrdquo ComputerVision and Image Understanding vol 104 no 2-3 pp 90ndash1262006

[3] X Fan B-C Yin and Y-F Sun ldquoYawning detection for mon-itoring driver fatiguerdquo in Proceedings of the 6th InternationalConference on Machine Learning and Cybernetics (ICMLC rsquo07)pp 664ndash668 Hong Kong August 2007

[4] R Grace V E Byrne D M Bierman et al ldquoA drowsy driverdetection system for heavy vehiclesrdquo in Proceedings of the 17thDigital Avionics Systems Conference vol 2 pp I361ndashI368Bellevue Wash USA 1998

10 International Journal of Vehicular Technology

[5] E Wahlstrom O Masoud and N Papanikolopoulos ldquoVision-based methods for driver monitoringrdquo in Proceedings of theIEEE Intelligent Transportation Systems vol 2 pp 903ndash908Shanghai China October 2003

[6] A Doshi and M M Trivedi ldquoOn the roles of eye gaze and headdynamics in predicting driverrsquos intent to change lanesrdquo IEEETransactions on Intelligent Transportation Systems vol 10 no3 pp 453ndash462 2009

[7] J J Jo S J Lee H G Jung K R Park and J J Kim ldquoVision-basedmethod for detecting driver drowsiness and distraction indriver monitoringsystemrdquo Optical Engineering vol 50 no 12pp 127202ndash127224 2011

[8] X Liu Y D Zhu and K Fujimura ldquoReal-time pose classicationfor driver monitoringrdquo in Proceedings of the IEEE IntelligentTransportation Systems pp 174ndash178 Singapore September2002

[9] P Watta S Lakshmanan and Y Hou ldquoNonparametricapproaches for estimating driver poserdquo IEEE Transactionson Vehicular Technology vol 56 no 4 pp 2028ndash2041 2007

[10] T Kato T Fujii andM Tanimoto ldquoDetection of driverrsquos posturein the car by using far infrared camerardquo in Proceedings of theIEEE Intelligent Vehicles Symposium pp 339ndash344 Parma ItalyJune 2004

[11] S Y Cheng S Park and M M Trivedi ldquoMulti-spectral andmulti-perspective video arrays for driver body tracking andactivity analysisrdquo Computer Vision and Image Understandingvol 106 no 2-3 pp 245ndash257 2007

[12] H Veeraraghavan N Bird S Atev and N Papanikolopou-los ldquoClassifiers for driver activity monitoringrdquo TransportationResearch C vol 15 no 1 pp 51ndash67 2007

[13] H Veeraraghavan S Atev N Bird P Schrater and NPapanikolopoulos ldquoDriver activity monitoring through super-vised and unsupervised learningrdquo in Proceedings of the 8thInternational IEEE Conference on Intelligent TransportationSystems pp 895ndash900 Vienna Austria September 2005

[14] C H Zhao B L Zhang J He and J Lian ldquoRecognition ofdriving postures by contourlet transform and random forestsrdquoIET Intelligent Transport Systems vol 6 no 2 pp 161ndash168 2012

[15] C H Zhao B L Zhang X Z Zhang S Q Zhao and H XLi ldquoRecognition of driving postures by combined features andrandom subspace ensemble of multilayer perception classifferrdquoJournal of Neural Computing and Applications vol 22 no 1Supplement pp 175ndash184 2000

[16] C Tran A Doshi andMM Trivedi ldquoModeling and predictionof driver behavior by foot gesture analysisrdquo Computer Visionand Image Understanding vol 116 no 3 pp 435ndash445 2012

[17] J C McCall and M M Trivedi ldquoVisual context capture andanalysis for driver attention monitoringrdquo in Proceedings of the7th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo04) pp 332ndash337 Washington DC USA Octo-ber 2004

[18] K Torkkola N Massey and C Wood ldquoDriver inattentiondetection through intelligent analysis of readily available sen-sorsrdquo in Proceedings of the 7th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo04) pp 326ndash331Washington DC USA October 2004

[19] D A Johnson and M M Trivedi ldquoDriving style recognitionusing a smartphone as a sensor platformrdquo in Proceedings ofthe 14th IEEE International Intelligent Transportation SystemsConference (ITSC rsquo11) pp 1609ndash1615 Washington DC USAOctober 2011

[20] A V Desai and M A Haque ldquoVigilance monitoring foroperator safety a simulation study on highway drivingrdquo Journalof Safety Research vol 37 no 2 pp 139ndash147 2006

[21] G Rigas Y Goletsis P Bougia and D I Fotiadis ldquoTowardsdriverrsquos state recognition on real driving conditionsrdquo Interna-tional Journal of Vehicular Technology vol 2011 Article ID617210 14 pages 2011

[22] M H Kutila M Jokela T Makinen J Viitanen G MarkkulaandTWVictor ldquoDriver cognitive distraction detection featureestimation and implementationrdquoProceedings of the Institution ofMechanical Engineers D vol 221 no 9 pp 1027ndash1040 2007

[23] A F Bobick and J W Davis ldquoThe recognition of humanmovement using temporal templatesrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 23 no 3 pp 257ndash267 2001

[24] Y Wang K Huang and T Tan ldquoHuman activity recognitionbased on R transformrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash8Minneapolis Minn USA June 2007

[25] H Chen H Chen Y Chen and S Lee ldquoHuman actionrecognition using star skeletonrdquo in Proceedings of the Inter-national Workshop on Video Surveillance and Sensor Networks(VSSN rsquo06) pp 171ndash178 Santa Barbara Calif USA October2006

[26] W Liang and D Suter ldquoInformative shape representationsfor human action recognitionrdquo in Proceedings of the 18thInternational Conference on Pattern Recognition (ICPR rsquo06) pp1266ndash1269 Hong Kong August 2006

[27] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the International Con-ference on Computer Vision vol 2 pp 726ndash733 Nice FranceOctober 2003

[28] A R Ahad T Ogata J K Tan H S Kim and S IshikawaldquoMotion recognition approach to solve overwriting in complexactionsrdquo in Proceedings of the 8th IEEE International Conferenceon Automatic Face and Gesture Recognition (FG rsquo08) pp 1ndash6Amsterdam The Netherlands September 2008

[29] S Ali and M Shah ldquoHuman action recognition in videosusing kinematic features and multiple instance learningrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol32 no 2 pp 288ndash303 2010

[30] S Danafar and N Gheissari ldquoAction recognition for surveil-lanceapplications using optic ow and SVMrdquo in Proceedings ofthe Asian Conferenceon Computer Vision (ACCV rsquo07) pp 457ndash466 Tokyo Japan November 2007

[31] I Laptev and T Lindeberg ldquoSpace-time interest pointsrdquo in Pro-ceedings of the 9th IEEE International Conference on ComputerVision pp 432ndash439 Nice France October 2003

[32] I Laptev B Caputo C Schuldt and T Lindeberg ldquoLocalvelocity-adapted motion events for spatio-temporal recogni-tionrdquo Computer Vision and Image Understanding vol 108 no3 pp 207ndash229 2007

[33] S Park and M Trivedi ldquoDriver activity analysis for intelligentvehicles issues and development frameworkrdquo in Proceedings ofthe IEEE Intelligent Vehicles Symposium pp 644ndash649 LasVegasNev USA June 2005

[34] A Bosch A Zisserman and X Munoz ldquoRepresenting shapewith a spatial pyramid kernelrdquo in Proceedings of the 6thACM International Conference on Image and Video Retrieval(CIVR rsquo07) pp 401ndash408 Amsterdam The Netherlands July2007

International Journal of Vehicular Technology 11

[35] N Dalal and B Triggs ldquoHistograms of oriented gradientsfor human detectionrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR rsquo05) pp 886ndash893 San Diego Calif USA June 2005

[36] G R Bradski and J Davis ldquoMotion segmentation and poserecognitionwithmotion history gradientsrdquo inProceedings of the5th IEEEWorkshop onApplications of Computer Vision pp 238ndash244 Palm Springs Calif USA 2000

[37] O Masoud and N Papanikolopoulos ldquoA method for humanaction recognitionrdquo Image and Vision Computing vol 21 no 8pp 729ndash743 2003

[38] H Yi D Rajan and L-T Chia ldquoA new motion histogram toindex motion content in video segmentsrdquo Pattern RecognitionLetters vol 26 no 9 pp 1221ndash1231 2005

[39] J Flusser T Suk and B ZitovMoments andMoment Invariantsin Pattern Recognition John Wiley amp Sons Chichester UK2009

[40] S Lazebnik C Schmid and J Ponce ldquoBeyond bags of featuresspatial pyramid matching for recognizing natural scene cate-goriesrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo06) pp2169ndash2178 New York NY USA June 2006

[41] R O Duda P E Hart and D G Stork Pattern ClassiffcationWiley-Interscience 2nd edition 2000

[42] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

International Journal of Vehicular Technology 5

Hand interaction with shift lever

Frame 640 650 660

Frame 1150 1159 1170

(a)

Hand operating the shift lever

Frame 794 830815

Frame 701 705 709

(b)

Hand interaction with head

Frame 3318 3327 3340

Frame 2792 2801 2811

(c)

Hand interaction with dashboardFrame 4617 4631 4638

Frame 4109 4118 4134

(d)

Figure 2 Four manually decomposed action primitives

The number of points in a cell at one level is simply the sumover those contained in the four cells it is divided into at thenext level thus forming a pyramid representation The cellcounts at each level of resolution are the bin counts for thehistogram representing that level The soft correspondencebetween the two point sets can then be computed as aweighted sum over the histogram intersections at each level

The resolution of an MHI image is 640 times 480 An MHIis divided into small spatial cells based on different pyramidlevels We follow the practice in [34] by limiting the numberof levels to 119871 = 3 to prevent overfitting Figure 5 shows thatthe pyramid at level 119897 has 2119899 times 2119899 cells

The magnitude 119898(119909 119910) and orientation 120579(119909 119910) of thegradient on a pixel (119909 119910) are calculated as follows

(119909 119910) = radic119892119909(119909 119910)

2

+ 119910119910(119909 119910)

2

120579 (119909 119910) = arctan119892119909(119909 119910)

119892119910(119909 119910)

(5)

where 119892119909(119909 119910) and 119892

119910(119909 119910) are image gradients along the 119909

and 119910 directions Each gradient orientation is quantized into119870 bins In each cell of every level gradients over all the pixelsare concatenated to form a local119870 bins histogram As a resulta ROI at level 119897 is represented as a 11987021198972119897 dimension vector

All the cells at different pyramid levels are combined to forma final PHOG vector with dimension of 119889 = 119870sum

119871

119897=04119897 to

represent the whole ROIThe dimension of the PHOG feature (eg 119889 = 680 when

119870 = 8 119871 = 3) is relatively high Many dimension reductionmethods can be applied to alleviate the problem We employthe widely used principal component analysis (PCA) [41] dueto its simplicity and effectiveness

5 Random Forest (RF) and OtherClassification Algorithms

Random forest (RF) [42] is an ensemble classifier usingmanydecision tree models which can be used for classification orregression A special advantage of RF is that the accuracy andvariable importance information is provided with the resultsRandom forests create a number of classification trees Whenan vector representing a new object is input for classificationit was sent to every tree in the forest A different subset of thetraining data are selected (asymp23) with replacement to traineach tree and remaining training data are used to estimateerror and variable importance Class assignment is made bythe number of votes from all of the trees

RF has only two hyperparameters the number of vari-ables119872 in the random subset at each node and the number

6 International Journal of Vehicular Technology

Frame 640 644 648 652 656

I(x y t)

diff(x y t)

D(x y t)

MEI(x y t)

MHI(x y t)

Figure 3 Example of the driverrsquos right hand moving to shift lever from steering wheel The first row is some key frames in a driving actionThe second row is the corresponding frame difference images The third row is binary images resulted from thresholding The forth row iscumulative motion energy images The fifth row is cumulative motion history images

of trees 119879 in the forest [42] Breimarsquos RF error rate dependson two parameters the correlation between any pair oftrees and the strength of each individual tree in the forestIncreasing the correlation increases the forest error rate whileincreasing the strength of the individual trees decreasesthis misclassification rate Reducing 119872 reduces both thecorrelation and the strength119872 is often set to the square rootof the number of inputs

When the training set for a particular tree is drawn bysampling with replacement about one-third of the cases areleft out of the sample set

The RF algorithm can be summarised as follows

(1) Choose parameter 119879 which is the number of trees togrow

(2) Choose parameter119898 which is used to split each nodeand119898 = 119872 where119872 is the number of input variablesand119898 is held constant while growing the forest

(3) Grow 119879 trees When growing each tree do the follow-ing

(i) Construct a bootstrap sample of size 119899 sampledfrom 119878

119899= (119883119894 119910119894) (119894 = 1)

119899 with replacementand grow a tree from this bootstrap sample

(ii) When growing a tree at each node select 119898variables at random and use them to find thebest split

(iii) Grow the tree to a maximal extent There is nopruning

(4) To classify point119883 collect votes from every tree in theforest and then use majority voting to decide on theclass label

In this paper we also compared the accuracy of RF andseveral popular classification methods including 119896-nearestneighbor (119896NN) classifier multilayer perceptron (MLP) andSupport Vector Machines (SVM) on the driving actiondatasets

51 Other Classification Methods

511 119896minusNearest Neighbor Classifier 119896-nearest neighbour(119896NN) classifier one of themost classic and simplest classifierin machine learning classifies object based on the minimaldistance to training examples in feature space by a majorityvote of its neighbours [41] As a type of lazy learning119896NN classifier does not do any distance computation orcomparison until the test data is given Specifically the objectis assigned to the most common class among its 119896 nearestneighbours For example the object is classified as the classof its nearest neighbour if 119896 equals 1 Theoretically the errorrate of 119896NN algorithm is infinitely close to Bayes errorwhile the training set size is infinity However a satisfactoryperformance of 119896NN algorithm prefers a large number oftraining data set which results in expensive computation inpractical

512 Multilayer Perceptron Classifier In neural networkmultilayer perceptron (MLP) is an extension of the single

International Journal of Vehicular Technology 7

650Frame 640

middot middot middot

(a)

1170Frame 1150

middot middot middot

(b)

709Frame 701

middot middot middot

(c)

830Frame 794

middot middot middot

(d)

3340Frame 3318

middot middot middot

(e)

2811Frame 2792

middot middot middot

(f)

4134Frame 4109

middot middot middot

(g)

4638Frame 4617

middot middot middot

(h)Figure 4 MHIs for different driving actions (a) Right hand moving to shift lever (b) Right hand moving back to steering wheel from shiftlever (c) Right hand operating the shift lever (d) Operating the shift lever (e) Right handmoving to head from steering wheel (f) Right handmoving back to steering wheel (g) Right moving back to steering wheel from dashboard (h) Right hand moving to dashboard from steeringwheel

Level = 0

(a)

Level = 1

(b)

Level = 2

(c)Figure 5 A schematic illustration of PHOG At each resolution level PHOG consists of a histogram of orientation gradients over each imagesubregion

layer linear perceptron by adding hidden layers in between[41] AnMLP is a feedforward artificial neural networkmodelthat maps sets of input data onto a set of appropriate outputsAnMLP consists of multiple layers of nodes that is the inputlayer single or multiple hidden layer and an output layer AnMLP classifier is usually trained by the error backpropagationalgorithm

513 Support Vector Machine Support vector machine(SVM) is one of the most commonly applied supervisedlearning algorithms A SVM is formally defined by a sepa-rating hyperplane which is in a high or infinite dimensional

space Given labeled training data SVM will generate anoptimal hyperplane to categorize new examples Intuitivelythe operation of the SVM algorithm is based on finding thehyperplane that gives the largest minimum distance to thetraining examples And the optimal separating hyperplanemaximizes the margin of the training data

6 Experiments

61 Holdout Experiment We choose the two standard exper-imental procedures namely holdout approach and the cross-validation approach to verify the driving action recognition

8 International Journal of Vehicular Technology

kNN RF SVM MLP

Accuracy

88019656 9443

9093

0

01

02

03

04

05

06

07

08

09

1

Classifiers

Figure 6 Bar plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

performance using RF classifier and the PHOG featureextracted from MHI Other three classifiers 119896NN MLP andSVM will be compared

In the holdout experiment 20 of the PHOG featuresare randomly selected as testing dataset while the remaining80 of the features are used as training dataset The holdoutexperiment is usually repeated 100 times and the classificationresults are recorded In each holdout experiment cycle thesame training and testing dataset are applied to the fourdifferent classifiers simultaneously to compare their perfor-mance

Generally classification accuracy is one of the mostcommon indicators used to evaluate the performance of theclassification Figures 6 and 7 are the bar plots and box plots ofthe classification accuracies from the four classifiers with thesame decomposed driving actionsThe results are the averagefrom 100 runs The average classification accuracies of 119896NNclassifier RF classifier SVM classifier and MLP classifierare 8801 9656 9443 and 9093 respectively It isobvious that the RF classifier performs the best among thefour classifiers compared

To further evaluate the performance of RF classifierconfusionmatrix is used to visualize the discrepancy betweenthe actual class labels and predicted results from the clas-sification Confusion matrix gives the full picture at theerrors made by a classification model The confusion matrixshows how the predictions are made by the model The rowscorrespond to the known class of the data that is the labelsin the dataThe columns correspond to the predictions madeby themodelThe value of each of element in thematrix is thenumber of predictions made with the class corresponding tothe column for example with the correct value as representedby the row Thus the diagonal elements show the numberof correct classifications made for each class and the off-diagonal elements show the errors made Figure 8 showsthe confusion matrix from the above experiment for the RFclassifier

1

098

096

094

092

09

088

086

084

082

Accuracy +

++

RF SVM MLPClassifiers

kNN

Figure 7 Box plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09570 00200 00100 00130

00287 09713 0 0

000038 0 09462 00500

0 0 00123 09877

Figure 8 Confusion matrix of RF classification result from theholdout experiment

In the figure classes labelled as one two three and fourcorrespond to hand interaction with shift lever operatingthe shift lever interaction with head and interaction withdashboard respectively In the confusionmatrix the columnsare the predicted classes while the rows are the true onesFor the RF classifier the average classification rate of thefour driving actions is 9656 The respective classificationaccuracies for the four driving actions are 957 97139462 and 9877 in holdout experiment respectively Itshows that the classes one and two tend to be easily confusedwith each other with error rate of about 2 and 287respectively On the other hand the error rates from theconfusion between classes three and four lie between 12and 5

62 119896-Ford Cross-Validation The second part of our experi-ment is to use 119896-ford cross-validation to further confirm theclassification performance of the driving actions In 119896-foldcross-validation the original sets of data will be portionedinto 119896 subsets randomly One subset is retained as thevalidation data for testing while the remaining 119896 minus 1 subsetsare used as training data The cross-validation process willthen be repeated 119896 times which means that each of the 119896

International Journal of Vehicular Technology 9

RF SVM MLP

Accuracy

9464983 9739 9591

0

01

02

03

04

05

06

07

08

09

1

ClassifierskNN

Figure 9 Bar plots of classification rates from 10-fold cross-validation

1

095

09

085

08

075

RF SVM MLPClassifiers

Accuracy

+

+++

++

kNN

Figure 10 Box plots of classification rates from 10-fold cross-validation

subsamples will be used exactly once as the validation dataThe estimation can be the average of the 119896 results The keyproperty of this method is that all observations are used forboth training and validation and each observation is used forvalidation exactly once We chose 10-fold cross-validation inthe experiment whichmeans that nine of the ten splitted setsare used for training and the remaining one is reserved fortesting

The evaluation procedure is similar to the holdout exper-iment The cross-validation experiment was also conducted100 times for each of the classificationmethods Each time thePHOG feature extracted from the driving action dataset wasrandomly divided into 10 folders The average classificationaccuracies of the 100 repetitions are shown in the bar plots ofFigure 9 and box plots of Figure 10The average classificationaccuracies of 119896-NN classifier RF classifier SVM classifierand MLP classifier are 9464 9830 9739 and 9591respectively From the bar plots box plots and confusionmatrix in Figures 9 10 and 11 the RF classifier clearlyoutperforms other three classifiers compared

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09657 00237 00004 00102

00292 09708 0 0

00041 0 09483 00476

0 0 00147 09853

Figure 11 Confusion matrix of RF classification from 10-fold cross-validation experiment

7 Conclusion

In this paper we proposed an efficient approach to recognisedriving action primitives by joint application of motion his-tory image and pyramid histogram of oriented gradientsTheproposed driving action primitives lead to the hierarchicalrepresentation of driver activities The manually labelledaction primitives are jointly represented by motion historyimage and pyramid histogram of oriented gradient (PHOG)The random forest classifier was exploited to evaluate theclassification performance which gives an accuracy of over94 from the holdout and cross-validation experimentsThis compares favorably over some other commonly usedclassifications methods

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The project is supported by Suzhou Municipal Science andTechnology Foundation Grants SS201109 and SYG201140

References

[1] ldquoWHO World report on road traffic injury preventionrdquohttpwwwwhointviolence injury preventionpublicationsroad trafficworld reporten

[2] T B Moeslund A Hilton and V Kruger ldquoA survey of advancesin vision-based humanmotion capture and analysisrdquo ComputerVision and Image Understanding vol 104 no 2-3 pp 90ndash1262006

[3] X Fan B-C Yin and Y-F Sun ldquoYawning detection for mon-itoring driver fatiguerdquo in Proceedings of the 6th InternationalConference on Machine Learning and Cybernetics (ICMLC rsquo07)pp 664ndash668 Hong Kong August 2007

[4] R Grace V E Byrne D M Bierman et al ldquoA drowsy driverdetection system for heavy vehiclesrdquo in Proceedings of the 17thDigital Avionics Systems Conference vol 2 pp I361ndashI368Bellevue Wash USA 1998

10 International Journal of Vehicular Technology

[5] E Wahlstrom O Masoud and N Papanikolopoulos ldquoVision-based methods for driver monitoringrdquo in Proceedings of theIEEE Intelligent Transportation Systems vol 2 pp 903ndash908Shanghai China October 2003

[6] A Doshi and M M Trivedi ldquoOn the roles of eye gaze and headdynamics in predicting driverrsquos intent to change lanesrdquo IEEETransactions on Intelligent Transportation Systems vol 10 no3 pp 453ndash462 2009

[7] J J Jo S J Lee H G Jung K R Park and J J Kim ldquoVision-basedmethod for detecting driver drowsiness and distraction indriver monitoringsystemrdquo Optical Engineering vol 50 no 12pp 127202ndash127224 2011

[8] X Liu Y D Zhu and K Fujimura ldquoReal-time pose classicationfor driver monitoringrdquo in Proceedings of the IEEE IntelligentTransportation Systems pp 174ndash178 Singapore September2002

[9] P Watta S Lakshmanan and Y Hou ldquoNonparametricapproaches for estimating driver poserdquo IEEE Transactionson Vehicular Technology vol 56 no 4 pp 2028ndash2041 2007

[10] T Kato T Fujii andM Tanimoto ldquoDetection of driverrsquos posturein the car by using far infrared camerardquo in Proceedings of theIEEE Intelligent Vehicles Symposium pp 339ndash344 Parma ItalyJune 2004

[11] S Y Cheng S Park and M M Trivedi ldquoMulti-spectral andmulti-perspective video arrays for driver body tracking andactivity analysisrdquo Computer Vision and Image Understandingvol 106 no 2-3 pp 245ndash257 2007

[12] H Veeraraghavan N Bird S Atev and N Papanikolopou-los ldquoClassifiers for driver activity monitoringrdquo TransportationResearch C vol 15 no 1 pp 51ndash67 2007

[13] H Veeraraghavan S Atev N Bird P Schrater and NPapanikolopoulos ldquoDriver activity monitoring through super-vised and unsupervised learningrdquo in Proceedings of the 8thInternational IEEE Conference on Intelligent TransportationSystems pp 895ndash900 Vienna Austria September 2005

[14] C H Zhao B L Zhang J He and J Lian ldquoRecognition ofdriving postures by contourlet transform and random forestsrdquoIET Intelligent Transport Systems vol 6 no 2 pp 161ndash168 2012

[15] C H Zhao B L Zhang X Z Zhang S Q Zhao and H XLi ldquoRecognition of driving postures by combined features andrandom subspace ensemble of multilayer perception classifferrdquoJournal of Neural Computing and Applications vol 22 no 1Supplement pp 175ndash184 2000

[16] C Tran A Doshi andMM Trivedi ldquoModeling and predictionof driver behavior by foot gesture analysisrdquo Computer Visionand Image Understanding vol 116 no 3 pp 435ndash445 2012

[17] J C McCall and M M Trivedi ldquoVisual context capture andanalysis for driver attention monitoringrdquo in Proceedings of the7th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo04) pp 332ndash337 Washington DC USA Octo-ber 2004

[18] K Torkkola N Massey and C Wood ldquoDriver inattentiondetection through intelligent analysis of readily available sen-sorsrdquo in Proceedings of the 7th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo04) pp 326ndash331Washington DC USA October 2004

[19] D A Johnson and M M Trivedi ldquoDriving style recognitionusing a smartphone as a sensor platformrdquo in Proceedings ofthe 14th IEEE International Intelligent Transportation SystemsConference (ITSC rsquo11) pp 1609ndash1615 Washington DC USAOctober 2011

[20] A V Desai and M A Haque ldquoVigilance monitoring foroperator safety a simulation study on highway drivingrdquo Journalof Safety Research vol 37 no 2 pp 139ndash147 2006

[21] G Rigas Y Goletsis P Bougia and D I Fotiadis ldquoTowardsdriverrsquos state recognition on real driving conditionsrdquo Interna-tional Journal of Vehicular Technology vol 2011 Article ID617210 14 pages 2011

[22] M H Kutila M Jokela T Makinen J Viitanen G MarkkulaandTWVictor ldquoDriver cognitive distraction detection featureestimation and implementationrdquoProceedings of the Institution ofMechanical Engineers D vol 221 no 9 pp 1027ndash1040 2007

[23] A F Bobick and J W Davis ldquoThe recognition of humanmovement using temporal templatesrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 23 no 3 pp 257ndash267 2001

[24] Y Wang K Huang and T Tan ldquoHuman activity recognitionbased on R transformrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash8Minneapolis Minn USA June 2007

[25] H Chen H Chen Y Chen and S Lee ldquoHuman actionrecognition using star skeletonrdquo in Proceedings of the Inter-national Workshop on Video Surveillance and Sensor Networks(VSSN rsquo06) pp 171ndash178 Santa Barbara Calif USA October2006

[26] W Liang and D Suter ldquoInformative shape representationsfor human action recognitionrdquo in Proceedings of the 18thInternational Conference on Pattern Recognition (ICPR rsquo06) pp1266ndash1269 Hong Kong August 2006

[27] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the International Con-ference on Computer Vision vol 2 pp 726ndash733 Nice FranceOctober 2003

[28] A R Ahad T Ogata J K Tan H S Kim and S IshikawaldquoMotion recognition approach to solve overwriting in complexactionsrdquo in Proceedings of the 8th IEEE International Conferenceon Automatic Face and Gesture Recognition (FG rsquo08) pp 1ndash6Amsterdam The Netherlands September 2008

[29] S Ali and M Shah ldquoHuman action recognition in videosusing kinematic features and multiple instance learningrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol32 no 2 pp 288ndash303 2010

[30] S Danafar and N Gheissari ldquoAction recognition for surveil-lanceapplications using optic ow and SVMrdquo in Proceedings ofthe Asian Conferenceon Computer Vision (ACCV rsquo07) pp 457ndash466 Tokyo Japan November 2007

[31] I Laptev and T Lindeberg ldquoSpace-time interest pointsrdquo in Pro-ceedings of the 9th IEEE International Conference on ComputerVision pp 432ndash439 Nice France October 2003

[32] I Laptev B Caputo C Schuldt and T Lindeberg ldquoLocalvelocity-adapted motion events for spatio-temporal recogni-tionrdquo Computer Vision and Image Understanding vol 108 no3 pp 207ndash229 2007

[33] S Park and M Trivedi ldquoDriver activity analysis for intelligentvehicles issues and development frameworkrdquo in Proceedings ofthe IEEE Intelligent Vehicles Symposium pp 644ndash649 LasVegasNev USA June 2005

[34] A Bosch A Zisserman and X Munoz ldquoRepresenting shapewith a spatial pyramid kernelrdquo in Proceedings of the 6thACM International Conference on Image and Video Retrieval(CIVR rsquo07) pp 401ndash408 Amsterdam The Netherlands July2007

International Journal of Vehicular Technology 11

[35] N Dalal and B Triggs ldquoHistograms of oriented gradientsfor human detectionrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR rsquo05) pp 886ndash893 San Diego Calif USA June 2005

[36] G R Bradski and J Davis ldquoMotion segmentation and poserecognitionwithmotion history gradientsrdquo inProceedings of the5th IEEEWorkshop onApplications of Computer Vision pp 238ndash244 Palm Springs Calif USA 2000

[37] O Masoud and N Papanikolopoulos ldquoA method for humanaction recognitionrdquo Image and Vision Computing vol 21 no 8pp 729ndash743 2003

[38] H Yi D Rajan and L-T Chia ldquoA new motion histogram toindex motion content in video segmentsrdquo Pattern RecognitionLetters vol 26 no 9 pp 1221ndash1231 2005

[39] J Flusser T Suk and B ZitovMoments andMoment Invariantsin Pattern Recognition John Wiley amp Sons Chichester UK2009

[40] S Lazebnik C Schmid and J Ponce ldquoBeyond bags of featuresspatial pyramid matching for recognizing natural scene cate-goriesrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo06) pp2169ndash2178 New York NY USA June 2006

[41] R O Duda P E Hart and D G Stork Pattern ClassiffcationWiley-Interscience 2nd edition 2000

[42] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

6 International Journal of Vehicular Technology

Frame 640 644 648 652 656

I(x y t)

diff(x y t)

D(x y t)

MEI(x y t)

MHI(x y t)

Figure 3 Example of the driverrsquos right hand moving to shift lever from steering wheel The first row is some key frames in a driving actionThe second row is the corresponding frame difference images The third row is binary images resulted from thresholding The forth row iscumulative motion energy images The fifth row is cumulative motion history images

of trees 119879 in the forest [42] Breimarsquos RF error rate dependson two parameters the correlation between any pair oftrees and the strength of each individual tree in the forestIncreasing the correlation increases the forest error rate whileincreasing the strength of the individual trees decreasesthis misclassification rate Reducing 119872 reduces both thecorrelation and the strength119872 is often set to the square rootof the number of inputs

When the training set for a particular tree is drawn bysampling with replacement about one-third of the cases areleft out of the sample set

The RF algorithm can be summarised as follows

(1) Choose parameter 119879 which is the number of trees togrow

(2) Choose parameter119898 which is used to split each nodeand119898 = 119872 where119872 is the number of input variablesand119898 is held constant while growing the forest

(3) Grow 119879 trees When growing each tree do the follow-ing

(i) Construct a bootstrap sample of size 119899 sampledfrom 119878

119899= (119883119894 119910119894) (119894 = 1)

119899 with replacementand grow a tree from this bootstrap sample

(ii) When growing a tree at each node select 119898variables at random and use them to find thebest split

(iii) Grow the tree to a maximal extent There is nopruning

(4) To classify point119883 collect votes from every tree in theforest and then use majority voting to decide on theclass label

In this paper we also compared the accuracy of RF andseveral popular classification methods including 119896-nearestneighbor (119896NN) classifier multilayer perceptron (MLP) andSupport Vector Machines (SVM) on the driving actiondatasets

51 Other Classification Methods

511 119896minusNearest Neighbor Classifier 119896-nearest neighbour(119896NN) classifier one of themost classic and simplest classifierin machine learning classifies object based on the minimaldistance to training examples in feature space by a majorityvote of its neighbours [41] As a type of lazy learning119896NN classifier does not do any distance computation orcomparison until the test data is given Specifically the objectis assigned to the most common class among its 119896 nearestneighbours For example the object is classified as the classof its nearest neighbour if 119896 equals 1 Theoretically the errorrate of 119896NN algorithm is infinitely close to Bayes errorwhile the training set size is infinity However a satisfactoryperformance of 119896NN algorithm prefers a large number oftraining data set which results in expensive computation inpractical

512 Multilayer Perceptron Classifier In neural networkmultilayer perceptron (MLP) is an extension of the single

International Journal of Vehicular Technology 7

650Frame 640

middot middot middot

(a)

1170Frame 1150

middot middot middot

(b)

709Frame 701

middot middot middot

(c)

830Frame 794

middot middot middot

(d)

3340Frame 3318

middot middot middot

(e)

2811Frame 2792

middot middot middot

(f)

4134Frame 4109

middot middot middot

(g)

4638Frame 4617

middot middot middot

(h)Figure 4 MHIs for different driving actions (a) Right hand moving to shift lever (b) Right hand moving back to steering wheel from shiftlever (c) Right hand operating the shift lever (d) Operating the shift lever (e) Right handmoving to head from steering wheel (f) Right handmoving back to steering wheel (g) Right moving back to steering wheel from dashboard (h) Right hand moving to dashboard from steeringwheel

Level = 0

(a)

Level = 1

(b)

Level = 2

(c)Figure 5 A schematic illustration of PHOG At each resolution level PHOG consists of a histogram of orientation gradients over each imagesubregion

layer linear perceptron by adding hidden layers in between[41] AnMLP is a feedforward artificial neural networkmodelthat maps sets of input data onto a set of appropriate outputsAnMLP consists of multiple layers of nodes that is the inputlayer single or multiple hidden layer and an output layer AnMLP classifier is usually trained by the error backpropagationalgorithm

513 Support Vector Machine Support vector machine(SVM) is one of the most commonly applied supervisedlearning algorithms A SVM is formally defined by a sepa-rating hyperplane which is in a high or infinite dimensional

space Given labeled training data SVM will generate anoptimal hyperplane to categorize new examples Intuitivelythe operation of the SVM algorithm is based on finding thehyperplane that gives the largest minimum distance to thetraining examples And the optimal separating hyperplanemaximizes the margin of the training data

6 Experiments

61 Holdout Experiment We choose the two standard exper-imental procedures namely holdout approach and the cross-validation approach to verify the driving action recognition

8 International Journal of Vehicular Technology

kNN RF SVM MLP

Accuracy

88019656 9443

9093

0

01

02

03

04

05

06

07

08

09

1

Classifiers

Figure 6 Bar plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

performance using RF classifier and the PHOG featureextracted from MHI Other three classifiers 119896NN MLP andSVM will be compared

In the holdout experiment 20 of the PHOG featuresare randomly selected as testing dataset while the remaining80 of the features are used as training dataset The holdoutexperiment is usually repeated 100 times and the classificationresults are recorded In each holdout experiment cycle thesame training and testing dataset are applied to the fourdifferent classifiers simultaneously to compare their perfor-mance

Generally classification accuracy is one of the mostcommon indicators used to evaluate the performance of theclassification Figures 6 and 7 are the bar plots and box plots ofthe classification accuracies from the four classifiers with thesame decomposed driving actionsThe results are the averagefrom 100 runs The average classification accuracies of 119896NNclassifier RF classifier SVM classifier and MLP classifierare 8801 9656 9443 and 9093 respectively It isobvious that the RF classifier performs the best among thefour classifiers compared

To further evaluate the performance of RF classifierconfusionmatrix is used to visualize the discrepancy betweenthe actual class labels and predicted results from the clas-sification Confusion matrix gives the full picture at theerrors made by a classification model The confusion matrixshows how the predictions are made by the model The rowscorrespond to the known class of the data that is the labelsin the dataThe columns correspond to the predictions madeby themodelThe value of each of element in thematrix is thenumber of predictions made with the class corresponding tothe column for example with the correct value as representedby the row Thus the diagonal elements show the numberof correct classifications made for each class and the off-diagonal elements show the errors made Figure 8 showsthe confusion matrix from the above experiment for the RFclassifier

1

098

096

094

092

09

088

086

084

082

Accuracy +

++

RF SVM MLPClassifiers

kNN

Figure 7 Box plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09570 00200 00100 00130

00287 09713 0 0

000038 0 09462 00500

0 0 00123 09877

Figure 8 Confusion matrix of RF classification result from theholdout experiment

In the figure classes labelled as one two three and fourcorrespond to hand interaction with shift lever operatingthe shift lever interaction with head and interaction withdashboard respectively In the confusionmatrix the columnsare the predicted classes while the rows are the true onesFor the RF classifier the average classification rate of thefour driving actions is 9656 The respective classificationaccuracies for the four driving actions are 957 97139462 and 9877 in holdout experiment respectively Itshows that the classes one and two tend to be easily confusedwith each other with error rate of about 2 and 287respectively On the other hand the error rates from theconfusion between classes three and four lie between 12and 5

62 119896-Ford Cross-Validation The second part of our experi-ment is to use 119896-ford cross-validation to further confirm theclassification performance of the driving actions In 119896-foldcross-validation the original sets of data will be portionedinto 119896 subsets randomly One subset is retained as thevalidation data for testing while the remaining 119896 minus 1 subsetsare used as training data The cross-validation process willthen be repeated 119896 times which means that each of the 119896

International Journal of Vehicular Technology 9

RF SVM MLP

Accuracy

9464983 9739 9591

0

01

02

03

04

05

06

07

08

09

1

ClassifierskNN

Figure 9 Bar plots of classification rates from 10-fold cross-validation

1

095

09

085

08

075

RF SVM MLPClassifiers

Accuracy

+

+++

++

kNN

Figure 10 Box plots of classification rates from 10-fold cross-validation

subsamples will be used exactly once as the validation dataThe estimation can be the average of the 119896 results The keyproperty of this method is that all observations are used forboth training and validation and each observation is used forvalidation exactly once We chose 10-fold cross-validation inthe experiment whichmeans that nine of the ten splitted setsare used for training and the remaining one is reserved fortesting

The evaluation procedure is similar to the holdout exper-iment The cross-validation experiment was also conducted100 times for each of the classificationmethods Each time thePHOG feature extracted from the driving action dataset wasrandomly divided into 10 folders The average classificationaccuracies of the 100 repetitions are shown in the bar plots ofFigure 9 and box plots of Figure 10The average classificationaccuracies of 119896-NN classifier RF classifier SVM classifierand MLP classifier are 9464 9830 9739 and 9591respectively From the bar plots box plots and confusionmatrix in Figures 9 10 and 11 the RF classifier clearlyoutperforms other three classifiers compared

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09657 00237 00004 00102

00292 09708 0 0

00041 0 09483 00476

0 0 00147 09853

Figure 11 Confusion matrix of RF classification from 10-fold cross-validation experiment

7 Conclusion

In this paper we proposed an efficient approach to recognisedriving action primitives by joint application of motion his-tory image and pyramid histogram of oriented gradientsTheproposed driving action primitives lead to the hierarchicalrepresentation of driver activities The manually labelledaction primitives are jointly represented by motion historyimage and pyramid histogram of oriented gradient (PHOG)The random forest classifier was exploited to evaluate theclassification performance which gives an accuracy of over94 from the holdout and cross-validation experimentsThis compares favorably over some other commonly usedclassifications methods

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The project is supported by Suzhou Municipal Science andTechnology Foundation Grants SS201109 and SYG201140

References

[1] ldquoWHO World report on road traffic injury preventionrdquohttpwwwwhointviolence injury preventionpublicationsroad trafficworld reporten

[2] T B Moeslund A Hilton and V Kruger ldquoA survey of advancesin vision-based humanmotion capture and analysisrdquo ComputerVision and Image Understanding vol 104 no 2-3 pp 90ndash1262006

[3] X Fan B-C Yin and Y-F Sun ldquoYawning detection for mon-itoring driver fatiguerdquo in Proceedings of the 6th InternationalConference on Machine Learning and Cybernetics (ICMLC rsquo07)pp 664ndash668 Hong Kong August 2007

[4] R Grace V E Byrne D M Bierman et al ldquoA drowsy driverdetection system for heavy vehiclesrdquo in Proceedings of the 17thDigital Avionics Systems Conference vol 2 pp I361ndashI368Bellevue Wash USA 1998

10 International Journal of Vehicular Technology

[5] E Wahlstrom O Masoud and N Papanikolopoulos ldquoVision-based methods for driver monitoringrdquo in Proceedings of theIEEE Intelligent Transportation Systems vol 2 pp 903ndash908Shanghai China October 2003

[6] A Doshi and M M Trivedi ldquoOn the roles of eye gaze and headdynamics in predicting driverrsquos intent to change lanesrdquo IEEETransactions on Intelligent Transportation Systems vol 10 no3 pp 453ndash462 2009

[7] J J Jo S J Lee H G Jung K R Park and J J Kim ldquoVision-basedmethod for detecting driver drowsiness and distraction indriver monitoringsystemrdquo Optical Engineering vol 50 no 12pp 127202ndash127224 2011

[8] X Liu Y D Zhu and K Fujimura ldquoReal-time pose classicationfor driver monitoringrdquo in Proceedings of the IEEE IntelligentTransportation Systems pp 174ndash178 Singapore September2002

[9] P Watta S Lakshmanan and Y Hou ldquoNonparametricapproaches for estimating driver poserdquo IEEE Transactionson Vehicular Technology vol 56 no 4 pp 2028ndash2041 2007

[10] T Kato T Fujii andM Tanimoto ldquoDetection of driverrsquos posturein the car by using far infrared camerardquo in Proceedings of theIEEE Intelligent Vehicles Symposium pp 339ndash344 Parma ItalyJune 2004

[11] S Y Cheng S Park and M M Trivedi ldquoMulti-spectral andmulti-perspective video arrays for driver body tracking andactivity analysisrdquo Computer Vision and Image Understandingvol 106 no 2-3 pp 245ndash257 2007

[12] H Veeraraghavan N Bird S Atev and N Papanikolopou-los ldquoClassifiers for driver activity monitoringrdquo TransportationResearch C vol 15 no 1 pp 51ndash67 2007

[13] H Veeraraghavan S Atev N Bird P Schrater and NPapanikolopoulos ldquoDriver activity monitoring through super-vised and unsupervised learningrdquo in Proceedings of the 8thInternational IEEE Conference on Intelligent TransportationSystems pp 895ndash900 Vienna Austria September 2005

[14] C H Zhao B L Zhang J He and J Lian ldquoRecognition ofdriving postures by contourlet transform and random forestsrdquoIET Intelligent Transport Systems vol 6 no 2 pp 161ndash168 2012

[15] C H Zhao B L Zhang X Z Zhang S Q Zhao and H XLi ldquoRecognition of driving postures by combined features andrandom subspace ensemble of multilayer perception classifferrdquoJournal of Neural Computing and Applications vol 22 no 1Supplement pp 175ndash184 2000

[16] C Tran A Doshi andMM Trivedi ldquoModeling and predictionof driver behavior by foot gesture analysisrdquo Computer Visionand Image Understanding vol 116 no 3 pp 435ndash445 2012

[17] J C McCall and M M Trivedi ldquoVisual context capture andanalysis for driver attention monitoringrdquo in Proceedings of the7th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo04) pp 332ndash337 Washington DC USA Octo-ber 2004

[18] K Torkkola N Massey and C Wood ldquoDriver inattentiondetection through intelligent analysis of readily available sen-sorsrdquo in Proceedings of the 7th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo04) pp 326ndash331Washington DC USA October 2004

[19] D A Johnson and M M Trivedi ldquoDriving style recognitionusing a smartphone as a sensor platformrdquo in Proceedings ofthe 14th IEEE International Intelligent Transportation SystemsConference (ITSC rsquo11) pp 1609ndash1615 Washington DC USAOctober 2011

[20] A V Desai and M A Haque ldquoVigilance monitoring foroperator safety a simulation study on highway drivingrdquo Journalof Safety Research vol 37 no 2 pp 139ndash147 2006

[21] G Rigas Y Goletsis P Bougia and D I Fotiadis ldquoTowardsdriverrsquos state recognition on real driving conditionsrdquo Interna-tional Journal of Vehicular Technology vol 2011 Article ID617210 14 pages 2011

[22] M H Kutila M Jokela T Makinen J Viitanen G MarkkulaandTWVictor ldquoDriver cognitive distraction detection featureestimation and implementationrdquoProceedings of the Institution ofMechanical Engineers D vol 221 no 9 pp 1027ndash1040 2007

[23] A F Bobick and J W Davis ldquoThe recognition of humanmovement using temporal templatesrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 23 no 3 pp 257ndash267 2001

[24] Y Wang K Huang and T Tan ldquoHuman activity recognitionbased on R transformrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash8Minneapolis Minn USA June 2007

[25] H Chen H Chen Y Chen and S Lee ldquoHuman actionrecognition using star skeletonrdquo in Proceedings of the Inter-national Workshop on Video Surveillance and Sensor Networks(VSSN rsquo06) pp 171ndash178 Santa Barbara Calif USA October2006

[26] W Liang and D Suter ldquoInformative shape representationsfor human action recognitionrdquo in Proceedings of the 18thInternational Conference on Pattern Recognition (ICPR rsquo06) pp1266ndash1269 Hong Kong August 2006

[27] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the International Con-ference on Computer Vision vol 2 pp 726ndash733 Nice FranceOctober 2003

[28] A R Ahad T Ogata J K Tan H S Kim and S IshikawaldquoMotion recognition approach to solve overwriting in complexactionsrdquo in Proceedings of the 8th IEEE International Conferenceon Automatic Face and Gesture Recognition (FG rsquo08) pp 1ndash6Amsterdam The Netherlands September 2008

[29] S Ali and M Shah ldquoHuman action recognition in videosusing kinematic features and multiple instance learningrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol32 no 2 pp 288ndash303 2010

[30] S Danafar and N Gheissari ldquoAction recognition for surveil-lanceapplications using optic ow and SVMrdquo in Proceedings ofthe Asian Conferenceon Computer Vision (ACCV rsquo07) pp 457ndash466 Tokyo Japan November 2007

[31] I Laptev and T Lindeberg ldquoSpace-time interest pointsrdquo in Pro-ceedings of the 9th IEEE International Conference on ComputerVision pp 432ndash439 Nice France October 2003

[32] I Laptev B Caputo C Schuldt and T Lindeberg ldquoLocalvelocity-adapted motion events for spatio-temporal recogni-tionrdquo Computer Vision and Image Understanding vol 108 no3 pp 207ndash229 2007

[33] S Park and M Trivedi ldquoDriver activity analysis for intelligentvehicles issues and development frameworkrdquo in Proceedings ofthe IEEE Intelligent Vehicles Symposium pp 644ndash649 LasVegasNev USA June 2005

[34] A Bosch A Zisserman and X Munoz ldquoRepresenting shapewith a spatial pyramid kernelrdquo in Proceedings of the 6thACM International Conference on Image and Video Retrieval(CIVR rsquo07) pp 401ndash408 Amsterdam The Netherlands July2007

International Journal of Vehicular Technology 11

[35] N Dalal and B Triggs ldquoHistograms of oriented gradientsfor human detectionrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR rsquo05) pp 886ndash893 San Diego Calif USA June 2005

[36] G R Bradski and J Davis ldquoMotion segmentation and poserecognitionwithmotion history gradientsrdquo inProceedings of the5th IEEEWorkshop onApplications of Computer Vision pp 238ndash244 Palm Springs Calif USA 2000

[37] O Masoud and N Papanikolopoulos ldquoA method for humanaction recognitionrdquo Image and Vision Computing vol 21 no 8pp 729ndash743 2003

[38] H Yi D Rajan and L-T Chia ldquoA new motion histogram toindex motion content in video segmentsrdquo Pattern RecognitionLetters vol 26 no 9 pp 1221ndash1231 2005

[39] J Flusser T Suk and B ZitovMoments andMoment Invariantsin Pattern Recognition John Wiley amp Sons Chichester UK2009

[40] S Lazebnik C Schmid and J Ponce ldquoBeyond bags of featuresspatial pyramid matching for recognizing natural scene cate-goriesrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo06) pp2169ndash2178 New York NY USA June 2006

[41] R O Duda P E Hart and D G Stork Pattern ClassiffcationWiley-Interscience 2nd edition 2000

[42] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

International Journal of Vehicular Technology 7

650Frame 640

middot middot middot

(a)

1170Frame 1150

middot middot middot

(b)

709Frame 701

middot middot middot

(c)

830Frame 794

middot middot middot

(d)

3340Frame 3318

middot middot middot

(e)

2811Frame 2792

middot middot middot

(f)

4134Frame 4109

middot middot middot

(g)

4638Frame 4617

middot middot middot

(h)Figure 4 MHIs for different driving actions (a) Right hand moving to shift lever (b) Right hand moving back to steering wheel from shiftlever (c) Right hand operating the shift lever (d) Operating the shift lever (e) Right handmoving to head from steering wheel (f) Right handmoving back to steering wheel (g) Right moving back to steering wheel from dashboard (h) Right hand moving to dashboard from steeringwheel

Level = 0

(a)

Level = 1

(b)

Level = 2

(c)Figure 5 A schematic illustration of PHOG At each resolution level PHOG consists of a histogram of orientation gradients over each imagesubregion

layer linear perceptron by adding hidden layers in between[41] AnMLP is a feedforward artificial neural networkmodelthat maps sets of input data onto a set of appropriate outputsAnMLP consists of multiple layers of nodes that is the inputlayer single or multiple hidden layer and an output layer AnMLP classifier is usually trained by the error backpropagationalgorithm

513 Support Vector Machine Support vector machine(SVM) is one of the most commonly applied supervisedlearning algorithms A SVM is formally defined by a sepa-rating hyperplane which is in a high or infinite dimensional

space Given labeled training data SVM will generate anoptimal hyperplane to categorize new examples Intuitivelythe operation of the SVM algorithm is based on finding thehyperplane that gives the largest minimum distance to thetraining examples And the optimal separating hyperplanemaximizes the margin of the training data

6 Experiments

61 Holdout Experiment We choose the two standard exper-imental procedures namely holdout approach and the cross-validation approach to verify the driving action recognition

8 International Journal of Vehicular Technology

kNN RF SVM MLP

Accuracy

88019656 9443

9093

0

01

02

03

04

05

06

07

08

09

1

Classifiers

Figure 6 Bar plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

performance using RF classifier and the PHOG featureextracted from MHI Other three classifiers 119896NN MLP andSVM will be compared

In the holdout experiment 20 of the PHOG featuresare randomly selected as testing dataset while the remaining80 of the features are used as training dataset The holdoutexperiment is usually repeated 100 times and the classificationresults are recorded In each holdout experiment cycle thesame training and testing dataset are applied to the fourdifferent classifiers simultaneously to compare their perfor-mance

Generally classification accuracy is one of the mostcommon indicators used to evaluate the performance of theclassification Figures 6 and 7 are the bar plots and box plots ofthe classification accuracies from the four classifiers with thesame decomposed driving actionsThe results are the averagefrom 100 runs The average classification accuracies of 119896NNclassifier RF classifier SVM classifier and MLP classifierare 8801 9656 9443 and 9093 respectively It isobvious that the RF classifier performs the best among thefour classifiers compared

To further evaluate the performance of RF classifierconfusionmatrix is used to visualize the discrepancy betweenthe actual class labels and predicted results from the clas-sification Confusion matrix gives the full picture at theerrors made by a classification model The confusion matrixshows how the predictions are made by the model The rowscorrespond to the known class of the data that is the labelsin the dataThe columns correspond to the predictions madeby themodelThe value of each of element in thematrix is thenumber of predictions made with the class corresponding tothe column for example with the correct value as representedby the row Thus the diagonal elements show the numberof correct classifications made for each class and the off-diagonal elements show the errors made Figure 8 showsthe confusion matrix from the above experiment for the RFclassifier

1

098

096

094

092

09

088

086

084

082

Accuracy +

++

RF SVM MLPClassifiers

kNN

Figure 7 Box plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09570 00200 00100 00130

00287 09713 0 0

000038 0 09462 00500

0 0 00123 09877

Figure 8 Confusion matrix of RF classification result from theholdout experiment

In the figure classes labelled as one two three and fourcorrespond to hand interaction with shift lever operatingthe shift lever interaction with head and interaction withdashboard respectively In the confusionmatrix the columnsare the predicted classes while the rows are the true onesFor the RF classifier the average classification rate of thefour driving actions is 9656 The respective classificationaccuracies for the four driving actions are 957 97139462 and 9877 in holdout experiment respectively Itshows that the classes one and two tend to be easily confusedwith each other with error rate of about 2 and 287respectively On the other hand the error rates from theconfusion between classes three and four lie between 12and 5

62 119896-Ford Cross-Validation The second part of our experi-ment is to use 119896-ford cross-validation to further confirm theclassification performance of the driving actions In 119896-foldcross-validation the original sets of data will be portionedinto 119896 subsets randomly One subset is retained as thevalidation data for testing while the remaining 119896 minus 1 subsetsare used as training data The cross-validation process willthen be repeated 119896 times which means that each of the 119896

International Journal of Vehicular Technology 9

RF SVM MLP

Accuracy

9464983 9739 9591

0

01

02

03

04

05

06

07

08

09

1

ClassifierskNN

Figure 9 Bar plots of classification rates from 10-fold cross-validation

1

095

09

085

08

075

RF SVM MLPClassifiers

Accuracy

+

+++

++

kNN

Figure 10 Box plots of classification rates from 10-fold cross-validation

subsamples will be used exactly once as the validation dataThe estimation can be the average of the 119896 results The keyproperty of this method is that all observations are used forboth training and validation and each observation is used forvalidation exactly once We chose 10-fold cross-validation inthe experiment whichmeans that nine of the ten splitted setsare used for training and the remaining one is reserved fortesting

The evaluation procedure is similar to the holdout exper-iment The cross-validation experiment was also conducted100 times for each of the classificationmethods Each time thePHOG feature extracted from the driving action dataset wasrandomly divided into 10 folders The average classificationaccuracies of the 100 repetitions are shown in the bar plots ofFigure 9 and box plots of Figure 10The average classificationaccuracies of 119896-NN classifier RF classifier SVM classifierand MLP classifier are 9464 9830 9739 and 9591respectively From the bar plots box plots and confusionmatrix in Figures 9 10 and 11 the RF classifier clearlyoutperforms other three classifiers compared

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09657 00237 00004 00102

00292 09708 0 0

00041 0 09483 00476

0 0 00147 09853

Figure 11 Confusion matrix of RF classification from 10-fold cross-validation experiment

7 Conclusion

In this paper we proposed an efficient approach to recognisedriving action primitives by joint application of motion his-tory image and pyramid histogram of oriented gradientsTheproposed driving action primitives lead to the hierarchicalrepresentation of driver activities The manually labelledaction primitives are jointly represented by motion historyimage and pyramid histogram of oriented gradient (PHOG)The random forest classifier was exploited to evaluate theclassification performance which gives an accuracy of over94 from the holdout and cross-validation experimentsThis compares favorably over some other commonly usedclassifications methods

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The project is supported by Suzhou Municipal Science andTechnology Foundation Grants SS201109 and SYG201140

References

[1] ldquoWHO World report on road traffic injury preventionrdquohttpwwwwhointviolence injury preventionpublicationsroad trafficworld reporten

[2] T B Moeslund A Hilton and V Kruger ldquoA survey of advancesin vision-based humanmotion capture and analysisrdquo ComputerVision and Image Understanding vol 104 no 2-3 pp 90ndash1262006

[3] X Fan B-C Yin and Y-F Sun ldquoYawning detection for mon-itoring driver fatiguerdquo in Proceedings of the 6th InternationalConference on Machine Learning and Cybernetics (ICMLC rsquo07)pp 664ndash668 Hong Kong August 2007

[4] R Grace V E Byrne D M Bierman et al ldquoA drowsy driverdetection system for heavy vehiclesrdquo in Proceedings of the 17thDigital Avionics Systems Conference vol 2 pp I361ndashI368Bellevue Wash USA 1998

10 International Journal of Vehicular Technology

[5] E Wahlstrom O Masoud and N Papanikolopoulos ldquoVision-based methods for driver monitoringrdquo in Proceedings of theIEEE Intelligent Transportation Systems vol 2 pp 903ndash908Shanghai China October 2003

[6] A Doshi and M M Trivedi ldquoOn the roles of eye gaze and headdynamics in predicting driverrsquos intent to change lanesrdquo IEEETransactions on Intelligent Transportation Systems vol 10 no3 pp 453ndash462 2009

[7] J J Jo S J Lee H G Jung K R Park and J J Kim ldquoVision-basedmethod for detecting driver drowsiness and distraction indriver monitoringsystemrdquo Optical Engineering vol 50 no 12pp 127202ndash127224 2011

[8] X Liu Y D Zhu and K Fujimura ldquoReal-time pose classicationfor driver monitoringrdquo in Proceedings of the IEEE IntelligentTransportation Systems pp 174ndash178 Singapore September2002

[9] P Watta S Lakshmanan and Y Hou ldquoNonparametricapproaches for estimating driver poserdquo IEEE Transactionson Vehicular Technology vol 56 no 4 pp 2028ndash2041 2007

[10] T Kato T Fujii andM Tanimoto ldquoDetection of driverrsquos posturein the car by using far infrared camerardquo in Proceedings of theIEEE Intelligent Vehicles Symposium pp 339ndash344 Parma ItalyJune 2004

[11] S Y Cheng S Park and M M Trivedi ldquoMulti-spectral andmulti-perspective video arrays for driver body tracking andactivity analysisrdquo Computer Vision and Image Understandingvol 106 no 2-3 pp 245ndash257 2007

[12] H Veeraraghavan N Bird S Atev and N Papanikolopou-los ldquoClassifiers for driver activity monitoringrdquo TransportationResearch C vol 15 no 1 pp 51ndash67 2007

[13] H Veeraraghavan S Atev N Bird P Schrater and NPapanikolopoulos ldquoDriver activity monitoring through super-vised and unsupervised learningrdquo in Proceedings of the 8thInternational IEEE Conference on Intelligent TransportationSystems pp 895ndash900 Vienna Austria September 2005

[14] C H Zhao B L Zhang J He and J Lian ldquoRecognition ofdriving postures by contourlet transform and random forestsrdquoIET Intelligent Transport Systems vol 6 no 2 pp 161ndash168 2012

[15] C H Zhao B L Zhang X Z Zhang S Q Zhao and H XLi ldquoRecognition of driving postures by combined features andrandom subspace ensemble of multilayer perception classifferrdquoJournal of Neural Computing and Applications vol 22 no 1Supplement pp 175ndash184 2000

[16] C Tran A Doshi andMM Trivedi ldquoModeling and predictionof driver behavior by foot gesture analysisrdquo Computer Visionand Image Understanding vol 116 no 3 pp 435ndash445 2012

[17] J C McCall and M M Trivedi ldquoVisual context capture andanalysis for driver attention monitoringrdquo in Proceedings of the7th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo04) pp 332ndash337 Washington DC USA Octo-ber 2004

[18] K Torkkola N Massey and C Wood ldquoDriver inattentiondetection through intelligent analysis of readily available sen-sorsrdquo in Proceedings of the 7th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo04) pp 326ndash331Washington DC USA October 2004

[19] D A Johnson and M M Trivedi ldquoDriving style recognitionusing a smartphone as a sensor platformrdquo in Proceedings ofthe 14th IEEE International Intelligent Transportation SystemsConference (ITSC rsquo11) pp 1609ndash1615 Washington DC USAOctober 2011

[20] A V Desai and M A Haque ldquoVigilance monitoring foroperator safety a simulation study on highway drivingrdquo Journalof Safety Research vol 37 no 2 pp 139ndash147 2006

[21] G Rigas Y Goletsis P Bougia and D I Fotiadis ldquoTowardsdriverrsquos state recognition on real driving conditionsrdquo Interna-tional Journal of Vehicular Technology vol 2011 Article ID617210 14 pages 2011

[22] M H Kutila M Jokela T Makinen J Viitanen G MarkkulaandTWVictor ldquoDriver cognitive distraction detection featureestimation and implementationrdquoProceedings of the Institution ofMechanical Engineers D vol 221 no 9 pp 1027ndash1040 2007

[23] A F Bobick and J W Davis ldquoThe recognition of humanmovement using temporal templatesrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 23 no 3 pp 257ndash267 2001

[24] Y Wang K Huang and T Tan ldquoHuman activity recognitionbased on R transformrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash8Minneapolis Minn USA June 2007

[25] H Chen H Chen Y Chen and S Lee ldquoHuman actionrecognition using star skeletonrdquo in Proceedings of the Inter-national Workshop on Video Surveillance and Sensor Networks(VSSN rsquo06) pp 171ndash178 Santa Barbara Calif USA October2006

[26] W Liang and D Suter ldquoInformative shape representationsfor human action recognitionrdquo in Proceedings of the 18thInternational Conference on Pattern Recognition (ICPR rsquo06) pp1266ndash1269 Hong Kong August 2006

[27] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the International Con-ference on Computer Vision vol 2 pp 726ndash733 Nice FranceOctober 2003

[28] A R Ahad T Ogata J K Tan H S Kim and S IshikawaldquoMotion recognition approach to solve overwriting in complexactionsrdquo in Proceedings of the 8th IEEE International Conferenceon Automatic Face and Gesture Recognition (FG rsquo08) pp 1ndash6Amsterdam The Netherlands September 2008

[29] S Ali and M Shah ldquoHuman action recognition in videosusing kinematic features and multiple instance learningrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol32 no 2 pp 288ndash303 2010

[30] S Danafar and N Gheissari ldquoAction recognition for surveil-lanceapplications using optic ow and SVMrdquo in Proceedings ofthe Asian Conferenceon Computer Vision (ACCV rsquo07) pp 457ndash466 Tokyo Japan November 2007

[31] I Laptev and T Lindeberg ldquoSpace-time interest pointsrdquo in Pro-ceedings of the 9th IEEE International Conference on ComputerVision pp 432ndash439 Nice France October 2003

[32] I Laptev B Caputo C Schuldt and T Lindeberg ldquoLocalvelocity-adapted motion events for spatio-temporal recogni-tionrdquo Computer Vision and Image Understanding vol 108 no3 pp 207ndash229 2007

[33] S Park and M Trivedi ldquoDriver activity analysis for intelligentvehicles issues and development frameworkrdquo in Proceedings ofthe IEEE Intelligent Vehicles Symposium pp 644ndash649 LasVegasNev USA June 2005

[34] A Bosch A Zisserman and X Munoz ldquoRepresenting shapewith a spatial pyramid kernelrdquo in Proceedings of the 6thACM International Conference on Image and Video Retrieval(CIVR rsquo07) pp 401ndash408 Amsterdam The Netherlands July2007

International Journal of Vehicular Technology 11

[35] N Dalal and B Triggs ldquoHistograms of oriented gradientsfor human detectionrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR rsquo05) pp 886ndash893 San Diego Calif USA June 2005

[36] G R Bradski and J Davis ldquoMotion segmentation and poserecognitionwithmotion history gradientsrdquo inProceedings of the5th IEEEWorkshop onApplications of Computer Vision pp 238ndash244 Palm Springs Calif USA 2000

[37] O Masoud and N Papanikolopoulos ldquoA method for humanaction recognitionrdquo Image and Vision Computing vol 21 no 8pp 729ndash743 2003

[38] H Yi D Rajan and L-T Chia ldquoA new motion histogram toindex motion content in video segmentsrdquo Pattern RecognitionLetters vol 26 no 9 pp 1221ndash1231 2005

[39] J Flusser T Suk and B ZitovMoments andMoment Invariantsin Pattern Recognition John Wiley amp Sons Chichester UK2009

[40] S Lazebnik C Schmid and J Ponce ldquoBeyond bags of featuresspatial pyramid matching for recognizing natural scene cate-goriesrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo06) pp2169ndash2178 New York NY USA June 2006

[41] R O Duda P E Hart and D G Stork Pattern ClassiffcationWiley-Interscience 2nd edition 2000

[42] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

8 International Journal of Vehicular Technology

kNN RF SVM MLP

Accuracy

88019656 9443

9093

0

01

02

03

04

05

06

07

08

09

1

Classifiers

Figure 6 Bar plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

performance using RF classifier and the PHOG featureextracted from MHI Other three classifiers 119896NN MLP andSVM will be compared

In the holdout experiment 20 of the PHOG featuresare randomly selected as testing dataset while the remaining80 of the features are used as training dataset The holdoutexperiment is usually repeated 100 times and the classificationresults are recorded In each holdout experiment cycle thesame training and testing dataset are applied to the fourdifferent classifiers simultaneously to compare their perfor-mance

Generally classification accuracy is one of the mostcommon indicators used to evaluate the performance of theclassification Figures 6 and 7 are the bar plots and box plots ofthe classification accuracies from the four classifiers with thesame decomposed driving actionsThe results are the averagefrom 100 runs The average classification accuracies of 119896NNclassifier RF classifier SVM classifier and MLP classifierare 8801 9656 9443 and 9093 respectively It isobvious that the RF classifier performs the best among thefour classifiers compared

To further evaluate the performance of RF classifierconfusionmatrix is used to visualize the discrepancy betweenthe actual class labels and predicted results from the clas-sification Confusion matrix gives the full picture at theerrors made by a classification model The confusion matrixshows how the predictions are made by the model The rowscorrespond to the known class of the data that is the labelsin the dataThe columns correspond to the predictions madeby themodelThe value of each of element in thematrix is thenumber of predictions made with the class corresponding tothe column for example with the correct value as representedby the row Thus the diagonal elements show the numberof correct classifications made for each class and the off-diagonal elements show the errors made Figure 8 showsthe confusion matrix from the above experiment for the RFclassifier

1

098

096

094

092

09

088

086

084

082

Accuracy +

++

RF SVM MLPClassifiers

kNN

Figure 7 Box plots of classification rates from holdout experimentwith 80 of data are used for training and the remaining for testing

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09570 00200 00100 00130

00287 09713 0 0

000038 0 09462 00500

0 0 00123 09877

Figure 8 Confusion matrix of RF classification result from theholdout experiment

In the figure classes labelled as one two three and fourcorrespond to hand interaction with shift lever operatingthe shift lever interaction with head and interaction withdashboard respectively In the confusionmatrix the columnsare the predicted classes while the rows are the true onesFor the RF classifier the average classification rate of thefour driving actions is 9656 The respective classificationaccuracies for the four driving actions are 957 97139462 and 9877 in holdout experiment respectively Itshows that the classes one and two tend to be easily confusedwith each other with error rate of about 2 and 287respectively On the other hand the error rates from theconfusion between classes three and four lie between 12and 5

62 119896-Ford Cross-Validation The second part of our experi-ment is to use 119896-ford cross-validation to further confirm theclassification performance of the driving actions In 119896-foldcross-validation the original sets of data will be portionedinto 119896 subsets randomly One subset is retained as thevalidation data for testing while the remaining 119896 minus 1 subsetsare used as training data The cross-validation process willthen be repeated 119896 times which means that each of the 119896

International Journal of Vehicular Technology 9

RF SVM MLP

Accuracy

9464983 9739 9591

0

01

02

03

04

05

06

07

08

09

1

ClassifierskNN

Figure 9 Bar plots of classification rates from 10-fold cross-validation

1

095

09

085

08

075

RF SVM MLPClassifiers

Accuracy

+

+++

++

kNN

Figure 10 Box plots of classification rates from 10-fold cross-validation

subsamples will be used exactly once as the validation dataThe estimation can be the average of the 119896 results The keyproperty of this method is that all observations are used forboth training and validation and each observation is used forvalidation exactly once We chose 10-fold cross-validation inthe experiment whichmeans that nine of the ten splitted setsare used for training and the remaining one is reserved fortesting

The evaluation procedure is similar to the holdout exper-iment The cross-validation experiment was also conducted100 times for each of the classificationmethods Each time thePHOG feature extracted from the driving action dataset wasrandomly divided into 10 folders The average classificationaccuracies of the 100 repetitions are shown in the bar plots ofFigure 9 and box plots of Figure 10The average classificationaccuracies of 119896-NN classifier RF classifier SVM classifierand MLP classifier are 9464 9830 9739 and 9591respectively From the bar plots box plots and confusionmatrix in Figures 9 10 and 11 the RF classifier clearlyoutperforms other three classifiers compared

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09657 00237 00004 00102

00292 09708 0 0

00041 0 09483 00476

0 0 00147 09853

Figure 11 Confusion matrix of RF classification from 10-fold cross-validation experiment

7 Conclusion

In this paper we proposed an efficient approach to recognisedriving action primitives by joint application of motion his-tory image and pyramid histogram of oriented gradientsTheproposed driving action primitives lead to the hierarchicalrepresentation of driver activities The manually labelledaction primitives are jointly represented by motion historyimage and pyramid histogram of oriented gradient (PHOG)The random forest classifier was exploited to evaluate theclassification performance which gives an accuracy of over94 from the holdout and cross-validation experimentsThis compares favorably over some other commonly usedclassifications methods

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The project is supported by Suzhou Municipal Science andTechnology Foundation Grants SS201109 and SYG201140

References

[1] ldquoWHO World report on road traffic injury preventionrdquohttpwwwwhointviolence injury preventionpublicationsroad trafficworld reporten

[2] T B Moeslund A Hilton and V Kruger ldquoA survey of advancesin vision-based humanmotion capture and analysisrdquo ComputerVision and Image Understanding vol 104 no 2-3 pp 90ndash1262006

[3] X Fan B-C Yin and Y-F Sun ldquoYawning detection for mon-itoring driver fatiguerdquo in Proceedings of the 6th InternationalConference on Machine Learning and Cybernetics (ICMLC rsquo07)pp 664ndash668 Hong Kong August 2007

[4] R Grace V E Byrne D M Bierman et al ldquoA drowsy driverdetection system for heavy vehiclesrdquo in Proceedings of the 17thDigital Avionics Systems Conference vol 2 pp I361ndashI368Bellevue Wash USA 1998

10 International Journal of Vehicular Technology

[5] E Wahlstrom O Masoud and N Papanikolopoulos ldquoVision-based methods for driver monitoringrdquo in Proceedings of theIEEE Intelligent Transportation Systems vol 2 pp 903ndash908Shanghai China October 2003

[6] A Doshi and M M Trivedi ldquoOn the roles of eye gaze and headdynamics in predicting driverrsquos intent to change lanesrdquo IEEETransactions on Intelligent Transportation Systems vol 10 no3 pp 453ndash462 2009

[7] J J Jo S J Lee H G Jung K R Park and J J Kim ldquoVision-basedmethod for detecting driver drowsiness and distraction indriver monitoringsystemrdquo Optical Engineering vol 50 no 12pp 127202ndash127224 2011

[8] X Liu Y D Zhu and K Fujimura ldquoReal-time pose classicationfor driver monitoringrdquo in Proceedings of the IEEE IntelligentTransportation Systems pp 174ndash178 Singapore September2002

[9] P Watta S Lakshmanan and Y Hou ldquoNonparametricapproaches for estimating driver poserdquo IEEE Transactionson Vehicular Technology vol 56 no 4 pp 2028ndash2041 2007

[10] T Kato T Fujii andM Tanimoto ldquoDetection of driverrsquos posturein the car by using far infrared camerardquo in Proceedings of theIEEE Intelligent Vehicles Symposium pp 339ndash344 Parma ItalyJune 2004

[11] S Y Cheng S Park and M M Trivedi ldquoMulti-spectral andmulti-perspective video arrays for driver body tracking andactivity analysisrdquo Computer Vision and Image Understandingvol 106 no 2-3 pp 245ndash257 2007

[12] H Veeraraghavan N Bird S Atev and N Papanikolopou-los ldquoClassifiers for driver activity monitoringrdquo TransportationResearch C vol 15 no 1 pp 51ndash67 2007

[13] H Veeraraghavan S Atev N Bird P Schrater and NPapanikolopoulos ldquoDriver activity monitoring through super-vised and unsupervised learningrdquo in Proceedings of the 8thInternational IEEE Conference on Intelligent TransportationSystems pp 895ndash900 Vienna Austria September 2005

[14] C H Zhao B L Zhang J He and J Lian ldquoRecognition ofdriving postures by contourlet transform and random forestsrdquoIET Intelligent Transport Systems vol 6 no 2 pp 161ndash168 2012

[15] C H Zhao B L Zhang X Z Zhang S Q Zhao and H XLi ldquoRecognition of driving postures by combined features andrandom subspace ensemble of multilayer perception classifferrdquoJournal of Neural Computing and Applications vol 22 no 1Supplement pp 175ndash184 2000

[16] C Tran A Doshi andMM Trivedi ldquoModeling and predictionof driver behavior by foot gesture analysisrdquo Computer Visionand Image Understanding vol 116 no 3 pp 435ndash445 2012

[17] J C McCall and M M Trivedi ldquoVisual context capture andanalysis for driver attention monitoringrdquo in Proceedings of the7th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo04) pp 332ndash337 Washington DC USA Octo-ber 2004

[18] K Torkkola N Massey and C Wood ldquoDriver inattentiondetection through intelligent analysis of readily available sen-sorsrdquo in Proceedings of the 7th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo04) pp 326ndash331Washington DC USA October 2004

[19] D A Johnson and M M Trivedi ldquoDriving style recognitionusing a smartphone as a sensor platformrdquo in Proceedings ofthe 14th IEEE International Intelligent Transportation SystemsConference (ITSC rsquo11) pp 1609ndash1615 Washington DC USAOctober 2011

[20] A V Desai and M A Haque ldquoVigilance monitoring foroperator safety a simulation study on highway drivingrdquo Journalof Safety Research vol 37 no 2 pp 139ndash147 2006

[21] G Rigas Y Goletsis P Bougia and D I Fotiadis ldquoTowardsdriverrsquos state recognition on real driving conditionsrdquo Interna-tional Journal of Vehicular Technology vol 2011 Article ID617210 14 pages 2011

[22] M H Kutila M Jokela T Makinen J Viitanen G MarkkulaandTWVictor ldquoDriver cognitive distraction detection featureestimation and implementationrdquoProceedings of the Institution ofMechanical Engineers D vol 221 no 9 pp 1027ndash1040 2007

[23] A F Bobick and J W Davis ldquoThe recognition of humanmovement using temporal templatesrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 23 no 3 pp 257ndash267 2001

[24] Y Wang K Huang and T Tan ldquoHuman activity recognitionbased on R transformrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash8Minneapolis Minn USA June 2007

[25] H Chen H Chen Y Chen and S Lee ldquoHuman actionrecognition using star skeletonrdquo in Proceedings of the Inter-national Workshop on Video Surveillance and Sensor Networks(VSSN rsquo06) pp 171ndash178 Santa Barbara Calif USA October2006

[26] W Liang and D Suter ldquoInformative shape representationsfor human action recognitionrdquo in Proceedings of the 18thInternational Conference on Pattern Recognition (ICPR rsquo06) pp1266ndash1269 Hong Kong August 2006

[27] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the International Con-ference on Computer Vision vol 2 pp 726ndash733 Nice FranceOctober 2003

[28] A R Ahad T Ogata J K Tan H S Kim and S IshikawaldquoMotion recognition approach to solve overwriting in complexactionsrdquo in Proceedings of the 8th IEEE International Conferenceon Automatic Face and Gesture Recognition (FG rsquo08) pp 1ndash6Amsterdam The Netherlands September 2008

[29] S Ali and M Shah ldquoHuman action recognition in videosusing kinematic features and multiple instance learningrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol32 no 2 pp 288ndash303 2010

[30] S Danafar and N Gheissari ldquoAction recognition for surveil-lanceapplications using optic ow and SVMrdquo in Proceedings ofthe Asian Conferenceon Computer Vision (ACCV rsquo07) pp 457ndash466 Tokyo Japan November 2007

[31] I Laptev and T Lindeberg ldquoSpace-time interest pointsrdquo in Pro-ceedings of the 9th IEEE International Conference on ComputerVision pp 432ndash439 Nice France October 2003

[32] I Laptev B Caputo C Schuldt and T Lindeberg ldquoLocalvelocity-adapted motion events for spatio-temporal recogni-tionrdquo Computer Vision and Image Understanding vol 108 no3 pp 207ndash229 2007

[33] S Park and M Trivedi ldquoDriver activity analysis for intelligentvehicles issues and development frameworkrdquo in Proceedings ofthe IEEE Intelligent Vehicles Symposium pp 644ndash649 LasVegasNev USA June 2005

[34] A Bosch A Zisserman and X Munoz ldquoRepresenting shapewith a spatial pyramid kernelrdquo in Proceedings of the 6thACM International Conference on Image and Video Retrieval(CIVR rsquo07) pp 401ndash408 Amsterdam The Netherlands July2007

International Journal of Vehicular Technology 11

[35] N Dalal and B Triggs ldquoHistograms of oriented gradientsfor human detectionrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR rsquo05) pp 886ndash893 San Diego Calif USA June 2005

[36] G R Bradski and J Davis ldquoMotion segmentation and poserecognitionwithmotion history gradientsrdquo inProceedings of the5th IEEEWorkshop onApplications of Computer Vision pp 238ndash244 Palm Springs Calif USA 2000

[37] O Masoud and N Papanikolopoulos ldquoA method for humanaction recognitionrdquo Image and Vision Computing vol 21 no 8pp 729ndash743 2003

[38] H Yi D Rajan and L-T Chia ldquoA new motion histogram toindex motion content in video segmentsrdquo Pattern RecognitionLetters vol 26 no 9 pp 1221ndash1231 2005

[39] J Flusser T Suk and B ZitovMoments andMoment Invariantsin Pattern Recognition John Wiley amp Sons Chichester UK2009

[40] S Lazebnik C Schmid and J Ponce ldquoBeyond bags of featuresspatial pyramid matching for recognizing natural scene cate-goriesrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo06) pp2169ndash2178 New York NY USA June 2006

[41] R O Duda P E Hart and D G Stork Pattern ClassiffcationWiley-Interscience 2nd edition 2000

[42] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

International Journal of Vehicular Technology 9

RF SVM MLP

Accuracy

9464983 9739 9591

0

01

02

03

04

05

06

07

08

09

1

ClassifierskNN

Figure 9 Bar plots of classification rates from 10-fold cross-validation

1

095

09

085

08

075

RF SVM MLPClassifiers

Accuracy

+

+++

++

kNN

Figure 10 Box plots of classification rates from 10-fold cross-validation

subsamples will be used exactly once as the validation dataThe estimation can be the average of the 119896 results The keyproperty of this method is that all observations are used forboth training and validation and each observation is used forvalidation exactly once We chose 10-fold cross-validation inthe experiment whichmeans that nine of the ten splitted setsare used for training and the remaining one is reserved fortesting

The evaluation procedure is similar to the holdout exper-iment The cross-validation experiment was also conducted100 times for each of the classificationmethods Each time thePHOG feature extracted from the driving action dataset wasrandomly divided into 10 folders The average classificationaccuracies of the 100 repetitions are shown in the bar plots ofFigure 9 and box plots of Figure 10The average classificationaccuracies of 119896-NN classifier RF classifier SVM classifierand MLP classifier are 9464 9830 9739 and 9591respectively From the bar plots box plots and confusionmatrix in Figures 9 10 and 11 the RF classifier clearlyoutperforms other three classifiers compared

Action 1

Action 2

Action 3

Action 4

Action 1 Action 2 Action 3 Action 4

09657 00237 00004 00102

00292 09708 0 0

00041 0 09483 00476

0 0 00147 09853

Figure 11 Confusion matrix of RF classification from 10-fold cross-validation experiment

7 Conclusion

In this paper we proposed an efficient approach to recognisedriving action primitives by joint application of motion his-tory image and pyramid histogram of oriented gradientsTheproposed driving action primitives lead to the hierarchicalrepresentation of driver activities The manually labelledaction primitives are jointly represented by motion historyimage and pyramid histogram of oriented gradient (PHOG)The random forest classifier was exploited to evaluate theclassification performance which gives an accuracy of over94 from the holdout and cross-validation experimentsThis compares favorably over some other commonly usedclassifications methods

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

The project is supported by Suzhou Municipal Science andTechnology Foundation Grants SS201109 and SYG201140

References

[1] ldquoWHO World report on road traffic injury preventionrdquohttpwwwwhointviolence injury preventionpublicationsroad trafficworld reporten

[2] T B Moeslund A Hilton and V Kruger ldquoA survey of advancesin vision-based humanmotion capture and analysisrdquo ComputerVision and Image Understanding vol 104 no 2-3 pp 90ndash1262006

[3] X Fan B-C Yin and Y-F Sun ldquoYawning detection for mon-itoring driver fatiguerdquo in Proceedings of the 6th InternationalConference on Machine Learning and Cybernetics (ICMLC rsquo07)pp 664ndash668 Hong Kong August 2007

[4] R Grace V E Byrne D M Bierman et al ldquoA drowsy driverdetection system for heavy vehiclesrdquo in Proceedings of the 17thDigital Avionics Systems Conference vol 2 pp I361ndashI368Bellevue Wash USA 1998

10 International Journal of Vehicular Technology

[5] E Wahlstrom O Masoud and N Papanikolopoulos ldquoVision-based methods for driver monitoringrdquo in Proceedings of theIEEE Intelligent Transportation Systems vol 2 pp 903ndash908Shanghai China October 2003

[6] A Doshi and M M Trivedi ldquoOn the roles of eye gaze and headdynamics in predicting driverrsquos intent to change lanesrdquo IEEETransactions on Intelligent Transportation Systems vol 10 no3 pp 453ndash462 2009

[7] J J Jo S J Lee H G Jung K R Park and J J Kim ldquoVision-basedmethod for detecting driver drowsiness and distraction indriver monitoringsystemrdquo Optical Engineering vol 50 no 12pp 127202ndash127224 2011

[8] X Liu Y D Zhu and K Fujimura ldquoReal-time pose classicationfor driver monitoringrdquo in Proceedings of the IEEE IntelligentTransportation Systems pp 174ndash178 Singapore September2002

[9] P Watta S Lakshmanan and Y Hou ldquoNonparametricapproaches for estimating driver poserdquo IEEE Transactionson Vehicular Technology vol 56 no 4 pp 2028ndash2041 2007

[10] T Kato T Fujii andM Tanimoto ldquoDetection of driverrsquos posturein the car by using far infrared camerardquo in Proceedings of theIEEE Intelligent Vehicles Symposium pp 339ndash344 Parma ItalyJune 2004

[11] S Y Cheng S Park and M M Trivedi ldquoMulti-spectral andmulti-perspective video arrays for driver body tracking andactivity analysisrdquo Computer Vision and Image Understandingvol 106 no 2-3 pp 245ndash257 2007

[12] H Veeraraghavan N Bird S Atev and N Papanikolopou-los ldquoClassifiers for driver activity monitoringrdquo TransportationResearch C vol 15 no 1 pp 51ndash67 2007

[13] H Veeraraghavan S Atev N Bird P Schrater and NPapanikolopoulos ldquoDriver activity monitoring through super-vised and unsupervised learningrdquo in Proceedings of the 8thInternational IEEE Conference on Intelligent TransportationSystems pp 895ndash900 Vienna Austria September 2005

[14] C H Zhao B L Zhang J He and J Lian ldquoRecognition ofdriving postures by contourlet transform and random forestsrdquoIET Intelligent Transport Systems vol 6 no 2 pp 161ndash168 2012

[15] C H Zhao B L Zhang X Z Zhang S Q Zhao and H XLi ldquoRecognition of driving postures by combined features andrandom subspace ensemble of multilayer perception classifferrdquoJournal of Neural Computing and Applications vol 22 no 1Supplement pp 175ndash184 2000

[16] C Tran A Doshi andMM Trivedi ldquoModeling and predictionof driver behavior by foot gesture analysisrdquo Computer Visionand Image Understanding vol 116 no 3 pp 435ndash445 2012

[17] J C McCall and M M Trivedi ldquoVisual context capture andanalysis for driver attention monitoringrdquo in Proceedings of the7th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo04) pp 332ndash337 Washington DC USA Octo-ber 2004

[18] K Torkkola N Massey and C Wood ldquoDriver inattentiondetection through intelligent analysis of readily available sen-sorsrdquo in Proceedings of the 7th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo04) pp 326ndash331Washington DC USA October 2004

[19] D A Johnson and M M Trivedi ldquoDriving style recognitionusing a smartphone as a sensor platformrdquo in Proceedings ofthe 14th IEEE International Intelligent Transportation SystemsConference (ITSC rsquo11) pp 1609ndash1615 Washington DC USAOctober 2011

[20] A V Desai and M A Haque ldquoVigilance monitoring foroperator safety a simulation study on highway drivingrdquo Journalof Safety Research vol 37 no 2 pp 139ndash147 2006

[21] G Rigas Y Goletsis P Bougia and D I Fotiadis ldquoTowardsdriverrsquos state recognition on real driving conditionsrdquo Interna-tional Journal of Vehicular Technology vol 2011 Article ID617210 14 pages 2011

[22] M H Kutila M Jokela T Makinen J Viitanen G MarkkulaandTWVictor ldquoDriver cognitive distraction detection featureestimation and implementationrdquoProceedings of the Institution ofMechanical Engineers D vol 221 no 9 pp 1027ndash1040 2007

[23] A F Bobick and J W Davis ldquoThe recognition of humanmovement using temporal templatesrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 23 no 3 pp 257ndash267 2001

[24] Y Wang K Huang and T Tan ldquoHuman activity recognitionbased on R transformrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash8Minneapolis Minn USA June 2007

[25] H Chen H Chen Y Chen and S Lee ldquoHuman actionrecognition using star skeletonrdquo in Proceedings of the Inter-national Workshop on Video Surveillance and Sensor Networks(VSSN rsquo06) pp 171ndash178 Santa Barbara Calif USA October2006

[26] W Liang and D Suter ldquoInformative shape representationsfor human action recognitionrdquo in Proceedings of the 18thInternational Conference on Pattern Recognition (ICPR rsquo06) pp1266ndash1269 Hong Kong August 2006

[27] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the International Con-ference on Computer Vision vol 2 pp 726ndash733 Nice FranceOctober 2003

[28] A R Ahad T Ogata J K Tan H S Kim and S IshikawaldquoMotion recognition approach to solve overwriting in complexactionsrdquo in Proceedings of the 8th IEEE International Conferenceon Automatic Face and Gesture Recognition (FG rsquo08) pp 1ndash6Amsterdam The Netherlands September 2008

[29] S Ali and M Shah ldquoHuman action recognition in videosusing kinematic features and multiple instance learningrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol32 no 2 pp 288ndash303 2010

[30] S Danafar and N Gheissari ldquoAction recognition for surveil-lanceapplications using optic ow and SVMrdquo in Proceedings ofthe Asian Conferenceon Computer Vision (ACCV rsquo07) pp 457ndash466 Tokyo Japan November 2007

[31] I Laptev and T Lindeberg ldquoSpace-time interest pointsrdquo in Pro-ceedings of the 9th IEEE International Conference on ComputerVision pp 432ndash439 Nice France October 2003

[32] I Laptev B Caputo C Schuldt and T Lindeberg ldquoLocalvelocity-adapted motion events for spatio-temporal recogni-tionrdquo Computer Vision and Image Understanding vol 108 no3 pp 207ndash229 2007

[33] S Park and M Trivedi ldquoDriver activity analysis for intelligentvehicles issues and development frameworkrdquo in Proceedings ofthe IEEE Intelligent Vehicles Symposium pp 644ndash649 LasVegasNev USA June 2005

[34] A Bosch A Zisserman and X Munoz ldquoRepresenting shapewith a spatial pyramid kernelrdquo in Proceedings of the 6thACM International Conference on Image and Video Retrieval(CIVR rsquo07) pp 401ndash408 Amsterdam The Netherlands July2007

International Journal of Vehicular Technology 11

[35] N Dalal and B Triggs ldquoHistograms of oriented gradientsfor human detectionrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR rsquo05) pp 886ndash893 San Diego Calif USA June 2005

[36] G R Bradski and J Davis ldquoMotion segmentation and poserecognitionwithmotion history gradientsrdquo inProceedings of the5th IEEEWorkshop onApplications of Computer Vision pp 238ndash244 Palm Springs Calif USA 2000

[37] O Masoud and N Papanikolopoulos ldquoA method for humanaction recognitionrdquo Image and Vision Computing vol 21 no 8pp 729ndash743 2003

[38] H Yi D Rajan and L-T Chia ldquoA new motion histogram toindex motion content in video segmentsrdquo Pattern RecognitionLetters vol 26 no 9 pp 1221ndash1231 2005

[39] J Flusser T Suk and B ZitovMoments andMoment Invariantsin Pattern Recognition John Wiley amp Sons Chichester UK2009

[40] S Lazebnik C Schmid and J Ponce ldquoBeyond bags of featuresspatial pyramid matching for recognizing natural scene cate-goriesrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo06) pp2169ndash2178 New York NY USA June 2006

[41] R O Duda P E Hart and D G Stork Pattern ClassiffcationWiley-Interscience 2nd edition 2000

[42] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

10 International Journal of Vehicular Technology

[5] E Wahlstrom O Masoud and N Papanikolopoulos ldquoVision-based methods for driver monitoringrdquo in Proceedings of theIEEE Intelligent Transportation Systems vol 2 pp 903ndash908Shanghai China October 2003

[6] A Doshi and M M Trivedi ldquoOn the roles of eye gaze and headdynamics in predicting driverrsquos intent to change lanesrdquo IEEETransactions on Intelligent Transportation Systems vol 10 no3 pp 453ndash462 2009

[7] J J Jo S J Lee H G Jung K R Park and J J Kim ldquoVision-basedmethod for detecting driver drowsiness and distraction indriver monitoringsystemrdquo Optical Engineering vol 50 no 12pp 127202ndash127224 2011

[8] X Liu Y D Zhu and K Fujimura ldquoReal-time pose classicationfor driver monitoringrdquo in Proceedings of the IEEE IntelligentTransportation Systems pp 174ndash178 Singapore September2002

[9] P Watta S Lakshmanan and Y Hou ldquoNonparametricapproaches for estimating driver poserdquo IEEE Transactionson Vehicular Technology vol 56 no 4 pp 2028ndash2041 2007

[10] T Kato T Fujii andM Tanimoto ldquoDetection of driverrsquos posturein the car by using far infrared camerardquo in Proceedings of theIEEE Intelligent Vehicles Symposium pp 339ndash344 Parma ItalyJune 2004

[11] S Y Cheng S Park and M M Trivedi ldquoMulti-spectral andmulti-perspective video arrays for driver body tracking andactivity analysisrdquo Computer Vision and Image Understandingvol 106 no 2-3 pp 245ndash257 2007

[12] H Veeraraghavan N Bird S Atev and N Papanikolopou-los ldquoClassifiers for driver activity monitoringrdquo TransportationResearch C vol 15 no 1 pp 51ndash67 2007

[13] H Veeraraghavan S Atev N Bird P Schrater and NPapanikolopoulos ldquoDriver activity monitoring through super-vised and unsupervised learningrdquo in Proceedings of the 8thInternational IEEE Conference on Intelligent TransportationSystems pp 895ndash900 Vienna Austria September 2005

[14] C H Zhao B L Zhang J He and J Lian ldquoRecognition ofdriving postures by contourlet transform and random forestsrdquoIET Intelligent Transport Systems vol 6 no 2 pp 161ndash168 2012

[15] C H Zhao B L Zhang X Z Zhang S Q Zhao and H XLi ldquoRecognition of driving postures by combined features andrandom subspace ensemble of multilayer perception classifferrdquoJournal of Neural Computing and Applications vol 22 no 1Supplement pp 175ndash184 2000

[16] C Tran A Doshi andMM Trivedi ldquoModeling and predictionof driver behavior by foot gesture analysisrdquo Computer Visionand Image Understanding vol 116 no 3 pp 435ndash445 2012

[17] J C McCall and M M Trivedi ldquoVisual context capture andanalysis for driver attention monitoringrdquo in Proceedings of the7th International IEEE Conference on Intelligent TransportationSystems (ITSC rsquo04) pp 332ndash337 Washington DC USA Octo-ber 2004

[18] K Torkkola N Massey and C Wood ldquoDriver inattentiondetection through intelligent analysis of readily available sen-sorsrdquo in Proceedings of the 7th International IEEE Conferenceon Intelligent Transportation Systems (ITSC rsquo04) pp 326ndash331Washington DC USA October 2004

[19] D A Johnson and M M Trivedi ldquoDriving style recognitionusing a smartphone as a sensor platformrdquo in Proceedings ofthe 14th IEEE International Intelligent Transportation SystemsConference (ITSC rsquo11) pp 1609ndash1615 Washington DC USAOctober 2011

[20] A V Desai and M A Haque ldquoVigilance monitoring foroperator safety a simulation study on highway drivingrdquo Journalof Safety Research vol 37 no 2 pp 139ndash147 2006

[21] G Rigas Y Goletsis P Bougia and D I Fotiadis ldquoTowardsdriverrsquos state recognition on real driving conditionsrdquo Interna-tional Journal of Vehicular Technology vol 2011 Article ID617210 14 pages 2011

[22] M H Kutila M Jokela T Makinen J Viitanen G MarkkulaandTWVictor ldquoDriver cognitive distraction detection featureestimation and implementationrdquoProceedings of the Institution ofMechanical Engineers D vol 221 no 9 pp 1027ndash1040 2007

[23] A F Bobick and J W Davis ldquoThe recognition of humanmovement using temporal templatesrdquo IEEE Transactions onPattern Analysis andMachine Intelligence vol 23 no 3 pp 257ndash267 2001

[24] Y Wang K Huang and T Tan ldquoHuman activity recognitionbased on R transformrdquo in Proceedings of the IEEE Conference onComputer Vision and Pattern Recognition (CVPR rsquo07) pp 1ndash8Minneapolis Minn USA June 2007

[25] H Chen H Chen Y Chen and S Lee ldquoHuman actionrecognition using star skeletonrdquo in Proceedings of the Inter-national Workshop on Video Surveillance and Sensor Networks(VSSN rsquo06) pp 171ndash178 Santa Barbara Calif USA October2006

[26] W Liang and D Suter ldquoInformative shape representationsfor human action recognitionrdquo in Proceedings of the 18thInternational Conference on Pattern Recognition (ICPR rsquo06) pp1266ndash1269 Hong Kong August 2006

[27] A A Efros A C Berg G Mori and J Malik ldquoRecognizingaction at a distancerdquo in Proceedings of the International Con-ference on Computer Vision vol 2 pp 726ndash733 Nice FranceOctober 2003

[28] A R Ahad T Ogata J K Tan H S Kim and S IshikawaldquoMotion recognition approach to solve overwriting in complexactionsrdquo in Proceedings of the 8th IEEE International Conferenceon Automatic Face and Gesture Recognition (FG rsquo08) pp 1ndash6Amsterdam The Netherlands September 2008

[29] S Ali and M Shah ldquoHuman action recognition in videosusing kinematic features and multiple instance learningrdquo IEEETransactions on Pattern Analysis and Machine Intelligence vol32 no 2 pp 288ndash303 2010

[30] S Danafar and N Gheissari ldquoAction recognition for surveil-lanceapplications using optic ow and SVMrdquo in Proceedings ofthe Asian Conferenceon Computer Vision (ACCV rsquo07) pp 457ndash466 Tokyo Japan November 2007

[31] I Laptev and T Lindeberg ldquoSpace-time interest pointsrdquo in Pro-ceedings of the 9th IEEE International Conference on ComputerVision pp 432ndash439 Nice France October 2003

[32] I Laptev B Caputo C Schuldt and T Lindeberg ldquoLocalvelocity-adapted motion events for spatio-temporal recogni-tionrdquo Computer Vision and Image Understanding vol 108 no3 pp 207ndash229 2007

[33] S Park and M Trivedi ldquoDriver activity analysis for intelligentvehicles issues and development frameworkrdquo in Proceedings ofthe IEEE Intelligent Vehicles Symposium pp 644ndash649 LasVegasNev USA June 2005

[34] A Bosch A Zisserman and X Munoz ldquoRepresenting shapewith a spatial pyramid kernelrdquo in Proceedings of the 6thACM International Conference on Image and Video Retrieval(CIVR rsquo07) pp 401ndash408 Amsterdam The Netherlands July2007

International Journal of Vehicular Technology 11

[35] N Dalal and B Triggs ldquoHistograms of oriented gradientsfor human detectionrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR rsquo05) pp 886ndash893 San Diego Calif USA June 2005

[36] G R Bradski and J Davis ldquoMotion segmentation and poserecognitionwithmotion history gradientsrdquo inProceedings of the5th IEEEWorkshop onApplications of Computer Vision pp 238ndash244 Palm Springs Calif USA 2000

[37] O Masoud and N Papanikolopoulos ldquoA method for humanaction recognitionrdquo Image and Vision Computing vol 21 no 8pp 729ndash743 2003

[38] H Yi D Rajan and L-T Chia ldquoA new motion histogram toindex motion content in video segmentsrdquo Pattern RecognitionLetters vol 26 no 9 pp 1221ndash1231 2005

[39] J Flusser T Suk and B ZitovMoments andMoment Invariantsin Pattern Recognition John Wiley amp Sons Chichester UK2009

[40] S Lazebnik C Schmid and J Ponce ldquoBeyond bags of featuresspatial pyramid matching for recognizing natural scene cate-goriesrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo06) pp2169ndash2178 New York NY USA June 2006

[41] R O Duda P E Hart and D G Stork Pattern ClassiffcationWiley-Interscience 2nd edition 2000

[42] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

International Journal of Vehicular Technology 11

[35] N Dalal and B Triggs ldquoHistograms of oriented gradientsfor human detectionrdquo in Proceedings of the IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition(CVPR rsquo05) pp 886ndash893 San Diego Calif USA June 2005

[36] G R Bradski and J Davis ldquoMotion segmentation and poserecognitionwithmotion history gradientsrdquo inProceedings of the5th IEEEWorkshop onApplications of Computer Vision pp 238ndash244 Palm Springs Calif USA 2000

[37] O Masoud and N Papanikolopoulos ldquoA method for humanaction recognitionrdquo Image and Vision Computing vol 21 no 8pp 729ndash743 2003

[38] H Yi D Rajan and L-T Chia ldquoA new motion histogram toindex motion content in video segmentsrdquo Pattern RecognitionLetters vol 26 no 9 pp 1221ndash1231 2005

[39] J Flusser T Suk and B ZitovMoments andMoment Invariantsin Pattern Recognition John Wiley amp Sons Chichester UK2009

[40] S Lazebnik C Schmid and J Ponce ldquoBeyond bags of featuresspatial pyramid matching for recognizing natural scene cate-goriesrdquo in Proceedings of the IEEE Computer Society Conferenceon Computer Vision and Pattern Recognition (CVPR rsquo06) pp2169ndash2178 New York NY USA June 2006

[41] R O Duda P E Hart and D G Stork Pattern ClassiffcationWiley-Interscience 2nd edition 2000

[42] L Breiman ldquoRandom forestsrdquoMachine Learning vol 45 no 1pp 5ndash32 2001

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of

Submit your manuscripts athttpwwwhindawicom

VLSI Design

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

RotatingMachinery

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporation httpwwwhindawicom

Journal ofEngineeringVolume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Shock and Vibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mechanical Engineering

Advances in

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Civil EngineeringAdvances in

Advances inAcoustics ampVibration

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Electrical and Computer Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

DistributedSensor Networks

International Journal of

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

thinspJournalthinspofthinsp

Sensors

Modelling amp Simulation in EngineeringHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Active and Passive Electronic Components

Advances inOptoElectronics

Hindawi Publishing Corporation httpwwwhindawicom

Volume 2014

RoboticsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Chemical EngineeringInternational Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Control Scienceand Engineering

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Antennas andPropagation

Hindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Navigation and Observation

International Journal of