23
Improving Bicycle Safety through Automated Real-Time Vehicle Detection Stephen Smaldone, Chetan Tonde, Vancheswaran K. Ananthanarayanan, Ahmed Elgammal, and Liviu Iftode {smaldone, cjtonde, vanchi, elgammal, iftode}@cs.rutgers.edu Technical Report DCS-TR-665 Department of Computer Science Rutgers University 110 Frelinghuysen Rd, Piscataway, NJ 08854 August 2010 Abstract The manner in which people use bicycles has changed very little since their invention in 1817. In that time, though, roadways have become congested with a dramatically less environmentally friendly mode of transportation: automobiles. These vehicles and the motorists who drive them represent, at times, a serious threat to the safety of both road cycling enthusiasts and bicycle commuters alike. Since bikers typically ride with the flow of traffic, the most dangerous situation for them is when they are being passed by a motorist from behind. As a result, a biker must spend a substantial amount of her cognitive and physical ability to periodically scan for rear-approaching vehicles, reducing her capacity to handle the bicycle safely and maintain continual awareness for both the forward and rearward situations. To improve road cycling safety, we present a system that augments a standard bicycle with audio and video sensing, and computational capabilities. This Cyber-Physical bicycle system continuously senses the environment behind a biker, processes the sensed data utilizing audio processing and computer vision techniques, automatically detects the occurrences of rear-approaching vehicles, and alerts the biker in real time prior to the encounter. In this paper we present (i) the design of our prototype Cyber- Physical bicycle system and (ii) the results of our evaluation using video and audio traces collected from bikers. These results demonstrate both the feasibility of the system, exhibiting a high degree of detection accuracy while operating under the real-time and energy constraints of the problem scenario. 1 Introduction Since their invention in 1817 [2], bicycles have proven to be a healthy and environmentally friendly mode of transportation for both enthusiasts and commuters alike. Although the bicycle has remained ubiquitous over time, the world has changed dramatically. Today, US roadways are dominated by automobiles; inefficient, aggressive, modes of human transport. Unfortunately, bikers are considered second-class citizens as they attempt to share roadways with motorists [30, 3]. In fact, this has been the situation for most of the lifetime of the bicycle. In 1896, the first automobile accident ever to occur in the U.S. took place in New York City between an automobile and a bicycle, and proved fatal for the cyclist [4]. According to a more recent report (2007) in the U.S., over 700 bicyclists die annually in accidents with automobiles, while there are over 44,000 annually reported cases of injuries due to bicycle-automobile accidents [5]. A key limiting factor for modern-day bikers is roadway safety. This is primarily due to the inherent unbalanced burden of risk that is placed on a biker during cyclist-motorist encounters. As will become evident in Section 2, existing laws and approaches towards biker safety (e.g., bike paths) are inadequate 1

Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

Embed Size (px)

Citation preview

Page 1: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

Improving Bicycle Safety through Automated Real-Time Vehicle

Detection

Stephen Smaldone, Chetan Tonde, Vancheswaran K. Ananthanarayanan,Ahmed Elgammal, and Liviu Iftode

{smaldone, cjtonde, vanchi, elgammal, iftode}@cs.rutgers.edu

Technical Report DCS-TR-665

Department of Computer ScienceRutgers University

110 Frelinghuysen Rd, Piscataway, NJ 08854

August 2010

Abstract

The manner in which people use bicycles has changed very little since their invention in 1817. Inthat time, though, roadways have become congested with a dramatically less environmentally friendlymode of transportation: automobiles. These vehicles and the motorists who drive them represent, attimes, a serious threat to the safety of both road cycling enthusiasts and bicycle commuters alike. Sincebikers typically ride with the flow of traffic, the most dangerous situation for them is when they are beingpassed by a motorist from behind. As a result, a biker must spend a substantial amount of her cognitiveand physical ability to periodically scan for rear-approaching vehicles, reducing her capacity to handlethe bicycle safely and maintain continual awareness for both the forward and rearward situations.

To improve road cycling safety, we present a system that augments a standard bicycle with audio andvideo sensing, and computational capabilities. This Cyber-Physical bicycle system continuously sensesthe environment behind a biker, processes the sensed data utilizing audio processing and computer visiontechniques, automatically detects the occurrences of rear-approaching vehicles, and alerts the biker inreal time prior to the encounter. In this paper we present (i) the design of our prototype Cyber-Physical bicycle system and (ii) the results of our evaluation using video and audio traces collected frombikers. These results demonstrate both the feasibility of the system, exhibiting a high degree of detectionaccuracy while operating under the real-time and energy constraints of the problem scenario.

1 Introduction

Since their invention in 1817 [2], bicycles have proven to be a healthy and environmentally friendly mode oftransportation for both enthusiasts and commuters alike. Although the bicycle has remained ubiquitous overtime, the world has changed dramatically. Today, US roadways are dominated by automobiles; inefficient,aggressive, modes of human transport. Unfortunately, bikers are considered second-class citizens as theyattempt to share roadways with motorists [30, 3]. In fact, this has been the situation for most of the lifetimeof the bicycle. In 1896, the first automobile accident ever to occur in the U.S. took place in New York Citybetween an automobile and a bicycle, and proved fatal for the cyclist [4]. According to a more recent report(2007) in the U.S., over 700 bicyclists die annually in accidents with automobiles, while there are over 44,000annually reported cases of injuries due to bicycle-automobile accidents [5].

A key limiting factor for modern-day bikers is roadway safety. This is primarily due to the inherentunbalanced burden of risk that is placed on a biker during cyclist-motorist encounters. As will becomeevident in Section 2, existing laws and approaches towards biker safety (e.g., bike paths) are inadequate

1

Page 2: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

solutions. At best, laws proscribe remedies for incidents after-the-fact, and bicycle paths are only as goodas the limited coverage they provide. Due to the unbalanced nature of the risk during cyclist-motoristencounters, a biker is forced to focus a substantial portion of her cognitive and physical capabilities on thetask of maintaining situational-awareness, by continuously probing for the occurrence of rear-approachingvehicles. What is required is a preventative, biker-centric solution to the problem.

In this paper, we present an approach that solves the problem by offloading the low-level cognitiverequirements from a biker to her bicycle. To support this approach, we enhance a standard bicycle withsensing and computational capabilities to create a Cyber-Physical bicycle system. The core goal of thissystem is to provide accurate and timely detection of rear-approaching vehicles to alert the biker of thepending encounter, through the cross-cutting application of mobile sensing, computer vision, and audioprocessing techniques. Our goal is to allow a bicycle to maintain situational-awareness for a biker andprovide updates to her as relevant changes occur that potentially impact her safety. To the best of ourknowledge, our system is the first to equip bicycles with sensors for the purpose of improving road cyclingsafety. We consider this an important problem domain, as roadway safety is a key limiting factor in theadoption of bicycling as a viable form of environmentally friendly transportation.

The novel contributions of this work are:• The design of an automated real time detection system for roadway cycling utilizing a multimodal

sensing approach. Our approach augments a bicycle with a camera, microphone, and computationalcapabilities, employs computer vision and audio processing techniques to detect when a motor vehicleis approaching from behind, and alerts the biker prior to each encounter.

• A prototype Cyber-Physical bicycle system enhanced with multimodal sensing (rear-facing camera andmicrophone) and on-board processing capabilities to perform real time approaching vehicle detection.

• The evaluation of our prototype system using real biker traces, which we collected (over 3 hours ofroadway cycling video and audio traces including more than 187 biker-motorist interactions), and realtime performance measurements. The results of this evaluation demonstrate the feasibility of thissystem, which exhibits a high degree of accuracy while continuously operating within the real-time andenergy constraints of the problem scenario.

In the remaining sections of this paper, we review the background of this problem in the context of existingreal-world solutions in Section 2, then we provide an overview of our solution in Section 3. Following this,we present the design of our system in Section 4, while Section 5 presents the results of our evaluation.We discuss a number of open issues and future work in Section 6. Finally, we review the related work inSection 7 and present our conclusions in Section 8.

2 Background and Motivation

In this Section, we review the broad-range of solutions that have attempted to improve biker safety. Thesesolutions include legal and infrastructure approaches. In summary, a quick review of biker fatality and injurystatistics shows consistent non-improvement [4, 5]. Clearly, none of these approaches, either in isolation ortaken as a whole, has improved biker safety very much. Finally, we focus in on what we believe to be thecore problem that still must be addressed.

Legal Approaches. Laws have been enacted in most states to force children to wear helmets whileriding bikes, but they do not apply to adult bikers. Statistics show that the average age for cyclist fatalitiesis 40 years and is 30 years for cyclist injuries [5]. Although a helmet is likely to be effective for simple falls,it is unclear what protection they provide during an accident involving a motor vehicle. More recently, somestates have even enacted laws to impose a three foot safe passing limit on motorists passing bikers [6].

Unfortunately, laws do little to prevent accidents due to insufficient enforcement and can only proscribeafter-the-fact remedies to accidents. According to statistical studies, three out of four at-fault drivers arenot even cited for hitting and killing cyclists, and 22% of fatal accidents involved “hit-and-run” drivers, whowere never even found or charged. For example, in New York City, of the 92% of drivers who were at-fault forkilling a cyclist, 74% did not even receive a traffic citation [22]. In short, laws may help penalize offenders,if properly enforced, but they provide little actual preventative protection for the biker.

Infrastructure Approaches. Certain cities, for example, Amsterdam in The Netherlands, and Port-land, OR, in the U.S., have built extensive networks of bicycle lanes [1] to promote safe cycling. Retrofitting

2

Page 3: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

A normal bicycle augmented with sensors (video and audio), CPU, wireless networking, and GPS to create aCyber-Physical bicycle system to detect rear-approaching motor vehicles. Alerts and related data collected bythe system are transmitted to a centralized service where they are logged and stored.

Figure 1: Cyber-Physical Bicycle.

existing roadways for bike lanes, is difficult and costly. Therefore, adoption of such infrastructure change isslow due to public inertia on the issue and funding competition from other more popular public projects.In less bicycle-friendly cities, or suburban areas, infrastructure coverage is much less consistent, providinglittle safety improvements for bikers in those areas. Oddly, the arguably more bicycle-hostile areas are thosethat also reject the bicycle lane idea, while the very areas where road cycling is well accepted provide theadditional safety of bicycle lanes. In the end, bicycle lanes are only as good as the coverage they provide,require a strong public commitment to install and maintain, and enforced legislation to ensure they are notimproperly utilized (e.g., illegally parked cars).

The Problem: Cognitive Overload. Since bikers typically ride with the flow of traffic, one of themore dangerous situations for them is when they are being passed from behind by a motor vehicle. To predictthe occurrence of these situations, a biker must spend a substantial amount of her cognitive and physicalability to periodically scan for rear-approaching vehicles, reducing her capacity to handle the bicycle safelyand maintain continual awareness for both the forward and rearward situations. In fact, when riding ingroups along the side of roadways, bikers will commonly call forward to alert the members of their groupabout a rear-approaching vehicle, as a natural way to share the cognitive load.

Accompanying the cognitive aspects are the physical requirements to maintaining situational awareness.To detect the presence of a vehicle a biker must look behind herself. This simple motion has two profoundeffects. First, by diverting her attention to the back, a biker loses the ability to track the roadway in front.This means that any approaching roadway hazard will be unnoticed by the biker during that period. Second,the physical act of looking back naturally causes a biker to drift to the side (i.e., either into the roadway orinto the shoulder) due to the dynamics of the complex forces that act on a moving bicycle.

In an attempt to reduce the cognitive and physical effects of looking back, many bikers employ rear-view mirrors. Unlike the mirrors found in a car, bicycle mirrors are either handle-bar or helmet mounted.In either form they can be very distracting, often do not provide a broad enough range of view behindthe cyclist, or a consistently good view throughout the active range of frequently changing cyclist ridingpositions. Regardless, the effects of periodically scanning a rear-view mirror are little better than directlyscanning behind. More recently, products such as Cerevellum [9] provide a video-based rear-view mirror

3

Page 4: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

solution. Aside from providing a continuous view of the situation behind a biker, it does not actually detectapproaching motor vehicles.

Finally, various ways to alert motorists to the presence of a biker have been developed. These includebike reflectors, flashing lights, and reflective clothing. The goal of this approach is to make the biker moreconspicuous to the motorist. Although they represent preventative measures, they do little to warn anunprepared biker to the presence of an approaching vehicle. Recently proposed, LightLane [11] projectsthe image of a bike path around a bicycle on the roadway. The idea is to provide a bicycle lane thatadapts to a biker’s behavior, by following them on the roadway, illustrating for motorists the safe passingdistance. Although this concept goes farther than other visible cues for motorists, it still does not provideany notification to a biker regarding the presence of an approaching vehicle.

3 Overview

To illustrate how our Cyber-Physical bicycle can improve the safety of bikers, we provide a high-level overviewof the system. The central element is a normal bicycle augmented with a set of sensors (audio and video)providing multiple modalities of sensing capabilities, compute resources in the form of a bicycle computer,and advanced wireless capabilities (3G, WiFi, and GPS). Figure 1 illustrates the key parts of the Cyber-Physical bicycle system and its core functionality.

As illustrated in the figure, a camera and microphone face backward from the bicycle’s direction offorward motion. These sensors collect video and audio data samples and stream them to an embeddedbicycle computer. Software executing on the computer continuously processes the data streams, in parallel,utilizing computer vision and audio signal processing techniques to perform rear-approaching motor vehicledetection. Further discussion of the detailed system design is deferred until Section 4 of the paper.

As a biker, shown in Figure 1, rides along a roadway on his normal daily route, his bicycle maintainssituational awareness for him. As a motor vehicle approaches the biker from behind, this occurrence isdetected by the bicycle computer and an audio notification is raised to the biker. Even after a notificationhas been raised, the system continues to track the approaching vehicle to determine the level of threat posedto the biker. To ascertain this, the system encapsulates the biker in a virtual safety zone, which is a three footperimeter around the bicycle. Any motor vehicle that crosses the threshold of this perimeter is consideredto have violated the safety zone of the biker, and the system considers this to be an unsafe interaction. Thevirtual safety zone is visually depicted in Figure 6.

Whenever an unsafe interaction occurs, the Cyber-Physical bicycle system performs three actions. First,it produces an audible warning to notify the biker. This warning is distinct from the early notificationproduced when the vehicle is first detected. Second, an image of the offending vehicle is collected andtransferred to a server at a centralized location. This is stored along with the location coordinates of theencounter, for future reference by the biker. Third, the encounter is logged at the centralized service andaggregated with the unsafe encounters of all other users. The purpose of this third action is to build aggregatesafety statistics for roadways frequented by bikers, to be used as a safety metric in safe route planning.

4 Cyber-Physical Bicycle Design

In this section, we describe the design of our Cyber-Physical bicycle system. The goals of our design arethreefold. First, the system must detect and track vehicles that approach from behind. Second, the systemmust be able to alert a biker in real time, when such an alert is still useful. Third, the system shoulddistinguish between vehicles that approach a biker in both safe and unsafe manners.

There are a number of significant challenges that must be overcome when building such a system toautomatically detect rear-approaching vehicles. They are:• Limited Resources. Detecting approaching vehicles is a computationally intensive process. This is

magnified by the real time (latency sensitive) requirements of the system. However, unlike engine-powered vehicles, bicycles have very limited available power generation capabilities. This along withobvious weight restrictions places a serious limitation on the computational resources that a bicyclecan be equipped to carry for this purpose.

4

Page 5: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

(a) Base Image - Approach. (b) Optical Flow - Approach. (c) Base Image - Depart.

(d) Optical Flow - Depart.

The four images are: (a) a rear-approaching vehicle, (b) optical flow of the rear-approaching vehicle in (a),(c) a rear-departing vehicle, and (d) optical flow of the rear-departing vehicle in (c).

Figure 2: Comparative Optical Flows of Approaching and Departing Cars.

• Platform Instability. As a moving platform, a bicycle is subject to a substantial amount of vibrationalmotion due to roadway conditions, as well as any rapid changes in directions caused by the biker orenvironment around her (wind, roadway surface, interactions with other bikers, etc). Therefore, evenwith commonly available image stabilization technologies, the resulting video stream obtained froma bicycle-mounted camera is subject to large amounts of jitter, sudden jarring vibrations, and rapidunpredictable changes in orientation.

• Approaching Vehicle Directionality. Vehicles may approach a biker from both the rear and the front.For vehicle detection to be useful, distinguishing between these two cases is critical. Otherwise, analert will be generated by the system for each vehicle encounter, regardless of the relative directions ofmotion, greatly reducing the effectiveness of the system.

4.1 Video-Based Detection

Since approaching vehicles travel at relatively high speeds compared to bicycles, it is necessary to be able todetect them as early as possible, with high accuracy. Although we would like to leverage existing automobiledriver assistance systems, which can detect the presence of vehicles in a driver’s blind spot, we cannot.These systems rely on the fact that nearby vehicles are prominent in the sensed image’s field of view (FOV).To provide ample notification time to a biker, the Cyber-Physical bicycle system must be able to detectapproaching vehicles while they are still very small (as small as 2% of the FOV).

In this subsection of the paper, we describe the design of the video-based sensing and detection subsystem.To inform the components of this design, we make two observations. First, as a bicycle moves down a roadway,all stationary objects behind the bicycle will appear to recede. Therefore, a vehicle approaching from therear will move counter to the motion of all other objects in the FOV. This visual cue is leveraged in theOptical Flow Analysis component (Section 4.1.1).

The second observation we make is that traffic on roadways follows a predictable pattern. Since bikersride with the flow of traffic, a rear-approaching vehicle will always appear in the same roadway lane as thebiker. Therefore, by identifying the natural segmentation of a roadway, we can reduce the image area (FOV)that must be analyzed, focusing on the only areas of the FOV where a rear-approaching vehicle is likely toappear. This visual cue is leveraged in the Roadway Segmentation Analysis component (Section 4.1.2).

Together, these two visual cues provide the necessary information for the video-based detection subsys-tem to reason about approaching vehicles and decide whether they present a hazardous condition to the

5

Page 6: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

Optical flow

Road and Lane detection

Car detection

Tracking

Approaching/ Departing Cars

OpticalFlow

model

Integration Alert

Feature Extraction

Classification

Audiomodel

This block diagram presents the video, audio, and combined detection subsystems for the Cyber-Physicalbicycle system.

Figure 3: Cyber-Physical Bicycle System Diagram.

This figure presents a graphical depiction of the optical flow trajectory for approaching and departing vehiclesin u-v space. Red indicates high approaching relative speed, while blue indicates high departing relative speed.

Figure 4: Optical Flow Trajectory.

biker. Additionally, the video-based subsystem performs real time tracking of any detected rear-approachingvehicles, and classifies each as either a safe or unsafe approach. Figure 3 shows a block diagram of thecomponents for the video processing subsystem (in blue), as well as the overall subsystem organization.

4.1.1 Optical Flow Analysis

Optical flow is the pattern of apparent motion of objects in a visual scene caused by the relative motionbetween an observer and the object [21]. For any pixel in an image, a two dimensional motion vectordescribing that pixel’s relative motion can be computed using two or more consecutive frames. This can becomputed at each pixel, which results in dense flow; or at certain pixels such as edge pixels, which resultsin sparse flow. There have been many techniques developed for computing optical flow from images, andare used extensively in motion estimation and for video compression, for example in MPEG encoding [16].Since computing the optical flow is a very computationally intensive task, the inherently parallel nature ofthe computation can be exploited to accelerate it through the use of commodity Graphics Processing Units(GPUs).

As discussed earlier, the motion of a rear-approaching vehicle is expected to be opposite to that of

6

Page 7: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

A graphical representation of roadway segmentation depicting the boundaries that are determined by the videosubsystem automatically for each video frame.

Figure 5: Roadway Segmentation.

everything else in the FOV. This allows the approaching vehicle to be distinguished from all other objects inthe camera’s FOV. Furthermore, this includes discriminating between approaching and departing vehicles.

Figure 2 shows an example of optical flow computed from images taken from video captured by a rear-facing camera mounted on the back of a moving bicycle. In the figure, there are four images: (a) a rear-approaching vehicle, (b) optical flow of the rear-approaching vehicle in (a), (c) a rear-departing vehicle, and(d) optical flow of the rear-departing vehicle in (c). Red pixels are moving towards the biker, while bluepixels are moving away. The stronger the color (i.e., more red or blue) indicates a faster relative speed. Fromthe figure, we observe that an approaching car exhibits a very distinct relative motion pattern compared tothe rest of the scene. It is a dominant red spot in the middle of a bluish scene. The reason for the backgroundvariation in color (from blue to purple) is largely due to the effects of the side-to-side motion of the bicyclewhile being actively ridden.

Figure 4 shows the typical trajectories for approaching and departing vehicles, as projected in opticalflow space. In the figure, the two axes represent the horizontal and vertical motions and optical flow is colorcoded as before [15]. The trajectory data plotted in this figure is of the approaching and departing cars inFigure 2.

The optical flow trajectory for the approaching car starts from the origin where the car is at its farthestvisible distance, where its speed is virtually indiscernible from the image. For the departing car, the opticalflow trajectory converges to the origin, at which point this car’s speed becomes indiscernible in the imagedue to distance. Clearly, optical flow is only useful when a vehicle is close enough to the bicycle such thatit shows distinct relative motion. As a vehicle moves farther away from the bicycle, the signal-to-noise ratioin the optical flow calculation (for the car’s image pixels) reduces, ultimately becoming dominated by theeffects of the motions of the biker on the image quality (e.g., jitter).

4.1.2 Roadway Segmentation Analysis

To reduce the computational requirements of vehicle detection, the video-based detection subsystem segmentsthe image of the roadway based upon the existing visible natural boundaries. The roadway is segmentedfrom the rest of the scene based on its color appearance using a statistical color model, learned for the roadcolor distribution. To facilitate this, a region close to the bicycle is chosen and used as a seed for frame byframe model generation. Then, this roadway color model is utilized to perform road segmentation on therest of the scene. Figure 5 shows an example of this.

In the figure, three boundaries are highlighted to illustrate the roadway segmentation concept. The upperhorizontal blue line identifies the horizon boundary, where the roadway meets the non-roadway portion ofthe image above it. The painted center lines on the roadway are also discovered. In the figure, this discoveryis highlighted by the blue and red points overlaying the double yellow center line. The lower blue line splits

7

Page 8: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

(a) Safe Vehicle Approach. (b) Unsafe Vehicle Approach. (c) Opposite Side Vehicle Approach.

The three images are: (a) a safe approach by a vehicle that does not cross into the safety zone, (b) an unsafeapproaching vehicle that crosses into the safety zone, and (c) a vehicle passing on the opposite lane.

Figure 6: Safety Zone.

the image horizontally at the point where the center (double yellow) lines intersect the left border of theimage. Finally, we also determine the vanishing point of the roadway using lines formed by edges present inthe image which is denoted by the red cross with a circle.

Together, the three boundaries and the vanishing point outline a bounding box where any valid rear-approaching vehicle detections are expected to appear. Therefore, more expensive detection computation isreduced by restricting to this region.

4.1.3 Vehicle Tracking

Once detected, the system needs to track approaching vehicles to be able to determine when an alert shouldoccur and also resolve multiple detections. Figure 6 shows examples of vehicle tracking for approachingvehicles (red box on vehicles) and correct handling of departing vehicles (no red box). Multiple car passesat the same time may also occur, which need to be taken care of by the tracking framework.

Many techniques exist for tracking moving objects based on their appearance, shape, and feature points.We use the appearance-based approach as it is more robust to noise from camera jitter. We use the methodproposed in [32], which uses Principle Component Analysis (PCA) based appearance model to track thedetected car in affine subspace. The method automatically learns and updates the appearance model startingfrom its first detection. This method works well for tracking objects with slight pose, scale, and illuminationchanges. In our case the appearance, pose, and scale of the car change rapidly, as it comes closer due tothe strong perspective effect. To handle this, we modified the above approach to be adaptive by updatingthe model parameters using online-based successive detection. Multiple cars can be tracked simultaneouslyby running multiple tracks for each detected car, and detection ambiguities are resolved by doing a linearassignment based on the location and appearance similarity of the detected blob. Finally, all tracks aremanaged based on their occurrence location and appearance, i.e., the track is valid car track if and only ifwe have detection close to previous location in next frame, or the track is deleted when car leaves the scene.

4.1.4 Safety Zone

As each tracked vehicle approaches a biker, the Cyber-Physical bicycle system calculates the distance betweenthe vehicle and the biker. The system maintains a configurable boundary (called the safety zone) aroundthe bicycle, and utilizes this boundary as the determining threshold between a “safe pass” and an “unsafepass”. In both cases, the biker is alerted to the presence of the approaching vehicle. In the latter case, anadditional alert is raised to warn the biker of the more dangerous encounter. This is demonstrated by thethree images shown in Figure 6.

Figure 6(a) shows the “safe pass” scenario. In the image, the yellow and white grid delineates the safetyzone boundary. Since the vehicle in the image is passing the biker outside of the safety zone, a yellow warningis signaled in the image (yellow box in the upper right corner). This is equivalent to the audible notificationthat a biker would receive in the scenario to warn about the “safely” approaching vehicle. Alternately, in

8

Page 9: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

Audio Feature Symbol Domain DescriptionSpectral Entropy SPE Freq The entropy of the spectrum. Falls off sharply when there is a vehicle

approaching.Spectral Centroid SPC Freq Weighted center of the spectrum. Increases when there is a vehicle ap-

proaching.Root-Mean-Square Ampli-tude

RMS Time The per-frame audio energy. Used to drop frames to ignore irrelevantsounds, e.g., a parked bike. Vehicle approaches are often louder thanother audio frames.

Zero CrossingRate

ZCR Time Number of times the audio signal crosses zero within a frame. Typicallyhigher for random noise.

Spectral Rolloff SPR Freq Frequency bin below which 91% of the signal energy is contained. In-creases slightly during vehicle approaches.

Spectral Flux SPF Freq Relative change in the frequency weighted by magnitude. Exhibits asmall increase at the onset of a vehicle approach.

Table 1: Audio Feature Descriptions.

Figure 6(b), we see an example of an “unsafe” car pass. In this image, the car crosses the safety zoneboundary and a red alert is visibly raised by the system. This box is meant to represent an audible alertraised to the biker to notify the “unsafely” approaching vehicle.

4.2 Audio-Based Detection

In addition to video-based approaching vehicle detection, the Cyber-Physical bicycle also performs audio-based sensing and analysis to detect the presence of any rear-approaching vehicles. To sample sound frombehind a biker, the bicycle is equipped with a rear-facing audio sensor (microphone), which continuouslycollects the ambient sound.

To inform this system, we make two observations. First, vehicles are clearly audible to a biker overthe background wind noise, when she turns her head. Therefore, a wind shielded audio sensor should beable to detect vehicular sound. Second, since sound from a vehicle is directional, the audio-based systemshould be able to discriminate between rear and front-approaching vehicles. Furthermore, a vehicle thatapproaches from the rear will cause a longer sound since it will take more time to pass a biker than oneapproaching from the front in the opposite traffic lane. Figure 3 shows the block diagram for the audioprocessing subcomponents in green.

4.2.1 Audio Feature Extraction

Audio is continuously captured from the rear-facing microphone. The resulting stream is then broken intofixed-sized frames for processing. We apply a hanning window to each frame and then passed to the featureextraction processor. A number of features is extracted from each frame to form a feature vector. Somefeatures are time domain features, while other are drawn from the frequency domain. Together, they representa number of characteristic audio features (commonly utilized audio features are described in [26, 28, 33]).

Table 1 provides a description for each audio feature used in the audio-based detection subsystem. Tocalculate the frequency-domain features, we first must transform the signal using a Fourier Transform (in ourcase we choose a standard Fast Fourier Transform for this). SPE, SPC, and ZCR all show a strong correlationwith the sound of a rear-approaching vehicle, while the rest of the features reinforce classification accuracy.We also utilize the first-order time differentials for these functions in classification to take advantage of thetemporal nature of vehicle approaches.

Figure 7 shows an example of audio feature behavior for two scenarios, one for a rear-approaching vehicleand one when no vehicle is present. From the figure, we can clearly observe a substantial difference in theaudio signal feature response to each scenario. For the rear-approaching vehicle case, there is a rise in theSPC between frames 40 and 120. This directly corresponds to the time frame when the vehicle approachesand passes the biker. These results are indicative of the other audio features utilized, and we do not presentother feature results for brevity.

9

Page 10: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

6250

6300

6350

6400

6450

6500

6550

20 40 60 80 100 120 140

Spe

ctra

l Cen

troi

d

Frame Number

No Vehicle PresentVehicle Present (Behind)

Plot of Spectral Centriod (SPC) feature over time. The two trends show the audio feature measurement forthe “vehicle present” and “no vehicle present” situations.

Figure 7: Audio Feature Discrimination.

4.2.2 Detection Model Creation

Once all of the relevant audio features have been extracted, they are utilized to build a classifier to detectapproaching motor vehicles. To build the classifier, we randomly select an equal number of audio featurevector instances from a set of annotated vectors. Each vector in the training set is annotated to place it intoone of three classes: (i) front-approach, (ii) rear-approach, and (iii) no approach. For our classifier, we usea decision tree, due to its relative simplicity, classification speed, and accuracy.

The classifier model is built offline and then used by the real-time detection algorithm to determine theclasses of current audio samples. Then, the real-time portion of the system continuously samples frames,extracts features, and classifies them according to this preconstructed classifier model. To reduce the vari-ability (noise) of the output of this classification due to false positives and negatives of individual frames,the results of classification are fed into a higher layer that performs further discrimination.

4.2.3 Higher-Order Discrimination

This layer receives a stream of classifier results from the layer below. To reduce the effects of incorrectlyclassified individual frames, the results are smoothed using window-based moving averaging. This removessmall fluctuations in results due to noise in either the original data stream or classifier results, therebyimproving the overall accuracy of the detector. Once a detection has been determined at this layer, an alertis generated, similar to the camera-based portion of the system, and ultimately propagated to the user.

4.3 Integrated Multi-Modal Detection

Finally, since both the video and audio-based systems operate in parallel, the Cyber-Physical bicycle systemcombines the results of both modalities to further improve the accuracy of detection. This combination al-lows the system to perform an additional comparison based upon the lower layer results to further filter falsepositives and catch false negatives by leveraging the diverse characteristics of the different modalities. Fig-ure 3 shows the audio and video integration block (in orange) to produce the final situational determination.We combine both these results in the following way.

We construct a binary state vector Sn = [An Vn an vn] where,

an = current audio prediction ∈ [0, 1]

vn = current video prediction ∈ [0, 1]

An ={

1 if∑n

i=n−w an < low or ≥ high0 otherwise

10

Page 11: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

Vn ={

1 if∑n

i=n−w vn < low or ≥ high0 otherwise

w = constant

We choose w = 5, low = 1 and high = 3 in our case. Now, conditioning on the different cases of Sn,an, and vn we issue an alert. If the system deems the situation to be a threat to the safety of the biker, theappropriate alert is raised.

4.4 Implementation

Implementation of video detection and tracking consists of two components: optical flow computation andvehicle tracking. To build the optical flow component, we used the well known, open source optical flowlibrary implementation available from [34]. We implemented this component using C++ and the NVIDIACUDA library version 2.3, and built it to utilize the GPU to perform image processing. Finally, our trackingcode was implemented in MATLAB 7.10 using the base code provided at [32].

The audio pipeline was built as two components, as well. The feature extraction module is implementedin Python 2.6. Audio capture is implemented using the PyAudio module and we use the Python Numericmodule to perform the feature and FFT calculations. The feature vector classifier is built using the Wekamachine learning toolkit (Version 3.6.2) and is implemented in Java 1.6. Weka is used to train the classifieron a subset of the roadway cycling traces. We use a decision tree classifier (J48) model, and built a Javaserver to handle the classification tasks.

We install and run the Cyber-Physical bicycle system on a HP Mini 311 netbook. We chose this asour platform for development, since the hardware closely matches embedded hardware utilized in variousmultimedia applications. The specifications of the netbook are: Intel Atom N280 1.67 GHz CPU, 3GB RAM,NVIDIA ION GPU, 80 GB SSD hard disk, and internal 6-cell Li-ion battery. The netbook weighs only 3.26lbs. The GPU is composed of 16 CUDA cores and 256 MB dedicated graphics memory.

5 Evaluation

Our experimental evaluation of the Cyber-Physical bicycle system prototype addresses three questions:• How accurate are video and audio-based sensing techniques in detecting rear-approaching vehicles?

(Section 5.2)• Can detection be performed in real-time? (Section 5.3)• What are the power requirements of the system? (Section 5.4)

All experiments are executed using our prototype implementation on the hardware specified in Section 4.4.

5.1 Roadway Cycling Traces

To support repeatable experimentation, we collected about over 3 hours of real roadway cycling traces,amounting to roughly 10 GB in size. To gather these traces, we mounted a rear-facing digital video recorder(Sony Handicam DCR-SX40), which includes a camera and microphone, to an ordinary road bicycle (TrekFX7.5). The camera collects video and audio recordings, while a biker rides the bicycle along a set of typicalrural bike routes in central New Jersey. Once collected, the traces are viewed and annotated. We manuallyannotated every interaction between the biker and a vehicle (both approaching and departing) using thetimestamps of each individual trace. In the traces, there are 52 incidents of rear-approaching vehicles and135 incidents of front-approaching vehicles. The average time that audio is heard for a rear-approach is2.96 seconds. Similarly, cars can be clearly visible in the video for 3.81 seconds on average. All traces arecollected during normal daylight hours.

Although we commonly refer to audio and video segments as frames, this term can be ambiguous whenapplied so broadly. For the purposes of our implementation and evaluation and to disambiguate terms, wedefine both here. A video frame is defined as a single 3 channel color image with 8-bit depth and a resolutionof 80 x 90 pixels. For audio, we define a frame to consist of 200 ms worth of data, with consecutive framesoverlapping each other by 50%. Our audio data is sampled as a single channel at 22050 Hz with an 8-bitsample size.

11

Page 12: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

Mode Positives NegativesVideo True 18 N/A

False 7 1

Audio True 17 N/AFalse 2 2

Combined True 18 N/AFalse 4 1

Table 2: Detection Accuracy.

5.2 Accuracy

Accuracy is a critical metric for our system. Since the key function of the system is to detect rear-approachingvehicles for a biker, the system must be accurate enough to instill confidence. In this section, we evaluatethe accuracy of the video and audio-based detectors individually, and then evaluate the combination of themas a single multimodal detector. In all cases, we replay our annotated roadway cycling traces by streamingthem through the system as if they are being directly captured from the video and audio sensors.

All results in this section are presented as confusion matrices. True positives (TP ) represent the caseswhere a rear-approaching vehicle is correctly identified. True negatives (TN) represent the cases when theabsence of a rear-approaching vehicle is correctly detected. False positives occur when something is mis-classified as a rear-approaching vehicle. A false negative occurs when a rear-approaching vehicle should havebeen detected, but was not. Finally, we define accuracy as: Accuracy = TP

TP+FP+FN .

5.2.1 Video Detection Accuracy

Table 2 presents the results for the video-based detector. From this table, we observe that in all tested cases,only 1 rear-approaching vehicle was not detected (false negatives), while only 4 false alerts were raised (falsepositives). Based upon these results, we calculate the overall accuracy of this method to be 69.2%.

To better understand the sources of incorrect classification, we reviewed the specific trace sequences thatwere incorrectly handled by the system. In one of the instances, a biker is being followed by another rider,who occludes a rear-approaching vehicle. This causes a false negative. A second example illustrates a falsepositive case. In this example, a car approaches a biker and is detected by the system correctly. Prior topassing the biker, though, the car slows down and takes a right turns of the roadway. This causes a falsepositive since the tracker remains in the scene and incorrectly tracks a section of roadway. Finally, a differentfalse positive occurs due to sudden jerk to the system caused by a section of uneven roadway surface. Thefast lateral motion causes the location of the car to shift by hundreds of pixels, confusing the tracker again.

5.2.2 Audio Detection Accuracy

Table 2 also presents the results for the audio-based detector. From the table we observe, only 2 rear-approaching vehicles were not detected, while 2 false alerts were generated. We calculate the overall accuracyto be 80.9% for the individual audio case.

After closer inspection of the results, we found that the reason for the first false negative is due to a veryslowly approaching car, which breaks to take a turn away from the bike before ever reaching the biker. This isa case where we were conservative in annotating the occurrence in the trace data, but the car never actuallygets close enough to be a danger to the biker. In another interesting instance, there is a car approachingvery slowly from behind and there is a flurry of vehicles passing the bike from the opposite direction. Theserapid car passes drown out the slow passing sounds of the behind car, confusing the classifier. This wasresponsible for a false positive and negative. The final false positive was due to the noise generated as thebicycle rode across a bad patch of road.

12

Page 13: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

(a) Vehicle 1. (b) Vehicle 2. (c) Vehicle 3.

No Car

CarGround Truth

No Car

CarAudio

No Car

CarVideo

No Car

CarCombined A/V

(d) Detection Timeline.

Three vehicles approach a biker from behind in succession. The Ground Truth graph plots the approaches overtime. The second and third graphs plot the individual Audio and Video detection results. The fourth graphplots the detection results for the combined multimodal approach.

Figure 8: Multimodal Accuracy - Multiple Vehicles.

5.2.3 Combined Multimodal Detection Accuracy

Finally, Table 2 also presents the results for the combined multimodal detector. For the combined case, wepresent the alert-level results. From the table, we observe the overall accuracy to be 78.3%. This represents9.1% increase and a 2.6% decrease in accuracy for video and audio, respectively. Aside from the effects onthe accuracy, the combination improves the real-time alerting function, as will become evident in Section 5.3

Three example scenarios from the traces provide a closer look at the results for the multimodal case.The first example is shown in Figure 8. In this scenario, three vehicles are approaching a biker from therear, shown as the images in Figures 8(a), 8(b), and 8(c). The figure also presents the detection results forthe scenario (Fig. 8(d)). We make a few observations from this example. First, although there are largeportions of agreement between both detectors, there are times when they do not agree. This happens whenone detector or the other is more accurate. It is this non-agreement that provides the boost in accuracyto the combined detector. Second, for the first example scenario, audio is unable to distinguish the closelypacked individual vehicles, while video is successful. In this examples, as well as other omitted for brevity,the Combined A/V (multimodal) detector performs closer to the Ground Truth than either the Video orAudio detectors alone.

5.3 Real-Time Performance

Accurate detection is only useful to a biker if the system can provide alerts in a timely fashion. In thissection, we evaluate the timeliness of the individual audio and video detection processing components, andthen measure the performance of the combined multimodal detector. We define timeliness in terms of theaverage number of seconds warning the system provides to the biker prior to a vehicle encounter. We alsopresent the potential timeliness as the percentage of total possible time when a vehicle could be detected bythe system, i.e., the difference between the first appearance of the vehicle and the time it passes the biker.

All experiments are measured by executing the real-time Cyber-Physical bicycle prototype, using our

13

Page 14: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

roadway cycling traces. Similar experiments were also executed while capturing video and audio datadirectly from the sensors to validate our trace-based results. We omit these results, and present only thetrace-based results, for brevity.

5.3.1 Video Performance

In this section, we investigate the end-to-end performance of the video-based detection processing. For thevideo case, two components form the critical performance path: optical flow processing and vehicle trackingsubsystems. In the following experiments, we measure the processing costs of both subsystems.

200

220

240

260

280

300

320

340

10 20 30 40 50 60 70 80 90 100

Tim

e (m

s)

Frame Number

Frame Processing TimeSmoothed Frame Processing Time

Plot of the time (in ms) to perform optical flow processing on individual video frames. Includes instantaneousand smoothed results.

Figure 9: Optical Flow Performance.

The first video performance experiment measures the latency of processing a frame using optical flowtechniques. Figure 9 presents the results of our trace-based experimentation. In the figure, the results arepresented as the frame processing latency (in ms) for the sequence of frames from a portion of the roadwaycycling traces. From the figure, we observe that each frame in the experiment is processed in real-time.Although the latency for frame processing fluctuates between 250 to 315 ms, the frame rate never dropsbelow 3 frames per second (FPS). We also observe, by focusing on the smoothed data (dashed line), thatoptical flow imposes a relatively constant and predictable processing cost.

The second video performance experiment examines the cost of vehicle tracking processing. The resultsare presented in Figure 10. In the figure, we measure the instantaneous FPS rate as each frame is processedin the Vehicle Tracking component. From the figure, we observe that the frame rate varies between 3.7 and1.3 FPS. We also observe that performance is quite stable around those two values. The reason for thisis that when the component is tracking vehicles, frame processing is more expensive, and we experience asubsequent drop in FPS rate. Two examples of this can be observed in the figure for frames 150-175 andagain for frames 225-260. Under these performance conditions, we measure the real-time video-based alertingto occur an average of 3.5 seconds prior to a vehicle encounter, which is 92% of the potential time.

5.3.2 Audio Performance

In this section, we investigate the end-to-end performance of the audio-based detection processing. For theaudio case, two components form the critical performance path: audio feature extraction and feature vectorclassification. In the following experiment, we measure the processing costs of both components.

To measure isolated performance of the audio-based detection components, we use only the audio streamsfrom the roadway cycling traces. For each audio frame, we measure the latency of feature extraction for eachfeature in the feature vector and the latency of classification for each vector. The results of this experimentare presented in Figure 11. In the figure, there are eight bars. Each of the first seven bars represents the

14

Page 15: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

0

1

2

3

4

5

0 50 100 150 200 250 300

Fra

mes

per

Sec

ond

(FP

S)

Frame Number

Vehicle TrackingSmoothed Vehicle Tracking

Plot of the number of frames processed per second (FPS) through the vehicle tracking component of thevideo-based detection subsystem. Includes instantaneous and smoothed results.

Figure 10: Vehicle Tracking Performance.

rate (in FPS) for one of the audio features calculated from the source frames (see Table 1 for details). Thelast bar is the rate (in FPS) of feature vector classification. From the figure, we observe that all componentsin the audio processing pipeline execute well within real time limits. In fact, the slowest component, FFT,processes nearly 100 frames per second. Under these performance conditions, we measure the real-timeaudio-based alerting to occur and average of 1.8 seconds prior to a vehicle encounter, which is 59% of thepotential time.

5.3.3 Combined Multimodal Performance

Finally, in this section we investigate the performance when utilizing a multimodal configuration. In thisscenario, both audio and video processing executes concurrently, processing data from the same audio/videostream in parallel.

Similar to the previous experiment, we measure the frame rate for each component in the pipeline andreport the results (in FPS). We consider three components: (i) optical flow processing, (ii) video-basedvehicle tracking, and (iii) audio processing. Figure 12 presents the results of this experiment. Although allthree components exhibit reduced performance, due to resource competition under concurrent processingconditions, the overall notification time is 3.5 seconds on average which equates to 92% of the potential.In short, the combined system matches the timeliness of the video-based subsystem, yet only loses a littleamount of accuracy.

5.4 Power Requirements

Since a bicycle is a mobile platform, we must consider the power requirements of the Cyber-Physical bicyclesystem. A system that provides accurate real time detection is only useful when operating. So, we needto understand the energy burden the system imposes. In this section we evaluate power requirements intwo manners. First, we measure the absolute power rates compared to a baseline idle system. The secondexperiment measures the battery discharge rate as the system executes various components. All experimentsare measured by executing the real-time Cyber-Physical bicycle prototype. To capture the measurements,we utilize the BatteryBar utility [7].

Table 3 presents the results for the power consumption rate measurements. All measurements are inWatts. From the table, we observe that Audio places a modest increase in power requirements over the idlesystem, while Video nearly doubles it due to the inclusion of GPU processing. In fact, the Video results arecomparable to the requirements for a typical Movie Player (VLC Media Player [14]). Finally, we observethat the Multimodal scenario is also comparable to the base Video scenario. This is due to the fact that both

15

Page 16: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

0

500

1000

1500

2000

RMS ZCR FFT SPE SPF SPC SPR CLASS

Fra

mes

per

Sec

ond

(FP

S)

Audio Processing Component

Plot of the number of frames processed per second (FPS) through the audio processing components of theaudio-based detection subsystem. First seven bars represent the audio feature extraction performance (FPS)for each feature in Table 1. The last bar is the performance of feature vector classification.

Figure 11: Audio Component Processing Performance.

Component Power ConsumptionRate (Watts)

Idle 6.4 (0.1)

Audio 8.5 (0.1)

Movie Player 12.4 (0.1)

Video 14.2 (0.3)

Multimodal 14.4 (0.2)

Power usage for each processing component. The baseline power usage is Idle. Audio, Video, and Multimodalrepresent individual power consumption rates for audio, video-based, and combined detection modes. MoviePlayer is a standard MPEG movie player, for comparison purposes. Results are the mean of five measurements(in Watts) and standard deviations are presented in parentheses.

Table 3: Power Consumption Rates.

the CPU and GPU are being utilized under Video. At that level of performance, the hardware is already ina high performance mode and the additional processing overheads of Audio cause an incremental increasein power requirements due to higher CPU and memory utilization.

Figure 13 presents the battery discharge rates for the five different scenarios. In the figure, we plot thepercentage of battery depletion over time as each scenario is executed. As expected, these results mirror thepower consumption rates with Video and Multimedia exhibiting similar battery discharge rates. Since bothare close to the rate of a typical Movie Player, we expect the battery lifetime while executing our system tobe comparable to that of a user watching a movie. Based upon these measurements, and assuming a linearbattery discharge rate, we estimate the battery lifetime of our system executing in multimodal mode to beapproximately 5 hours. Of course, battery discharge rates are non-linear, but that is not likely to changethe fact that the estimated lifetime matches on the order of typical roadway bicycle ride durations [8, 12].Moreover, since a common application of most modern netbook users is to watch movies (operating onbattery power), experience suggests that a typical netbook battery will support the Cyber-Physical bicyclesystem for the required duration.

16

Page 17: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

0

2

4

6

8

10

12

Video-OF Video-Track Audio

Fra

mes

per

Sec

ond

(FP

S)

Frame Processing Component

Plot of the number of frames processed per second (FPS) through each processing components of the multi-modal detection system.

Figure 12: Multimodal Processing Performance.

6 Discussion and Future Work

6.1 User Study

The central goal of this work is to reduce the cognitive overhead of a biker to allow her to focus attention onbicycle handling and the roadway ahead. Evaluating this is a challenging problem as it requires thoroughcoverage of different biker skill levels, riding styles, roadway and route characteristics, environmental condi-tions, and user interface issues. As such, we acknowledge the importance and need for a full user study andplan to conduct one in the future as a separate, but related piece of research. We envisage this to include twocomponents. The first is a systematic enumeration and categorization of the relevant contributing factorsto be included in the study. The second is a broad study over many hours of system evaluation in the fieldwith real bikers on typical rides. Although we plan to pursue a user study, this is not meant to diminishthe trace data used in this study. In fact, the data used in our experiments is real data and is of the samequality that would be collected in real time during a user study.

6.2 System Limitations

Detection Accuracy. Although the results prove the accuracy of our system, they also allow room forimprovement. Our design is biased towards eliminating false negatives, since we consider an unannouncedrear-approach to be worse than a false positive. We believe that there is a trade-off here that shouldbe explored, as we attempt to further optimize the system. Additionally, we are investigating additionalcomputer vision techniques, for example profile analysis, to allow our system to recognize the front profileof a vehicle as a means of further validating the presence of a rear-approaching vehicle.

Real-Time Performance. Although our system meets the requirements for real time alerting, in multi-modal detection it comes close to fully utilizing the resources of our prototype platform. This ultimatelyplaces an upper bound on the frame rate (FPS) that the system can process in real time. Since the videocapture rate for a typical video camera can be 30 FPS or higher and our real time prototype processesbetween 1.5 to 10 FPS, there is a potential for improvements in accuracy through optimization. In conjunc-tion with this optimization, we plan to explore the inherent trade-offs between video quality, accuracy, andperformance.

Another direction that we envisage is to introduce adaptivity to the multimodal system. The ideais to better understand the environmental conditions that affect the accuracy of video and audio-baseddetection, and to adaptively adjust the level of precessing for each technique. For example, under conditionsof low visibility, the system could reduce the frame rate for video and rely more on audio-based detection.

17

Page 18: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

0.68

0.69

0.7

0.71

0.72

0.73

0.74

0.75

0.76

0.77

0.78

0 5 10 15 20 25 30 35

Per

cent

Rem

aini

ng B

atte

ry

Time (min)

IdleAudioVideo

Movie PlayerMultimodal

Plot of the battery discharge rates for each processing component. The baseline power usage is Idle. Audioand Video represent individual discharge rates for audio and video-based detection components executing inisolation. Multimodal represents concurrent audio and video-based detection execution, and Movie Player is astandard MPEG movie player, for comparison purposes.

Figure 13: Battery Discharge.

Conversely, under conditions of high levels of ambient noise, the system could reduce the frame rate foraudio.

Limited Visibility Situations. Under conditions of good visibility, our system performs very well, asdescribed in Section 5. In the scope of this work, we have not considered the possible scenarios of limitedvisibility conditions. At this time, our system relies upon its multimodal nature to accurately detect rear-approaching vehicles. Clearly, audio-based detection does not depend on visibility, even working underconditions of complete darkness. Since we do rely on multimodal combination to achieve the best possibleaccuracy, we should also consider variable visibility scenarios. We plan to do so in two ways. First, we willinclude such tests as part of a larger user study. Second, we will purposely gather additional traces froma wide variety of conditions, to provide a more comprehensive data set to perform repeatable experimentaltrials.

Sensor Calibration. In this work, we focus on one specific implementation of the Cyber-Physical bicyclesystem. One practical issue that must be addressed to enable a broad deployment of this system on generichardware is that of sensor calibration. We have largely ignored this in this paper, since we rely on specificconsistent hardware, which eliminates the issue. If we include an additional goal to broadly deploy the systemsoftware on diverse generic hardware, including a variety of different types of cameras and microphones, thenwe must address the system-sensor calibration issue. For the camera, the focal length must be discovered toallow for accurate distances between bikers and vehicles to be determined. For the microphone, the systemmust regenerate the classifier models, based upon sound samples captured from the target sensor. Althoughwe have purposefully left this out of the scope of this paper, we plan to address this in the future.

6.3 Power Generation

Although we have investigated system power utilization as part of our evaluation, we have assumed thatall power is supplied by the platform’s integrated battery. This completely ignores the presence of ”green”power. There are at least three forms of power present that we may tap to supplement the system. First, asa biker pedals, she generates power, which can be captured by specialized bicycle wheel hubs [13]. Such hubscan generate up to 6 Watts of continuous power, and are commonly used by bikers to power headlamps.Second, as a bicycle moves between 13 and 20 mph, an opposing wind force is generated, which can beharnessed by small, attachable, wind turbines [10] to capture power. Finally, since bicycling is an fair-weather, outdoors activity, solar energy is typically available and can also be harnessed to capture ambientpower for the system. Together, these three sources of power might be utilized as a source for battery

18

Page 19: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

charging.

6.4 Roadway Hazard Detection

There are numerous, highly-visible obstacles that may appear along a biker’s path such as tree branchesor parked cars. Such obstacles are typically observable by a forward-facing biker, and do not require anyCyber-Physical aid. However, there is a class of obstacles that are less observable, yet pose a more serioushazard to biker safety. For example, under normal conditions, a biker being passed by a slow-moving motorvehicle may not be placed in a particularly unsafe situation. On the other hand, if that same biker happensto be riding over a cluster of potholes while being passed by the motor vehicle, there is a greater risk ofpotential collision due to the increased chances the biker may lose control of his bicycle. Neither the roadwaycondition nor the slow-moving vehicle poses an individual threat. It is the combination that places unduerisk on the biker. As future work, we intend to investigate the possibility of performing roadway sensing tocontinuously measure roadway surface conditions. We plan to use a combination of accelerometer, video,and audio-based sensing to collect the motion of a bicycle as it travels along the roadway, and to identifyvisible roadway anomalies. Accurate roadway sensing from a moving motor vehicle has proven difficult,yet tractable [22]. Compared to motor vehicles, bicycles pose additional difficulties. They are substantiallylighter, and therefore more susceptible to slight perturbations in motion.

6.5 CyberPeloton (Platoons of Bicycles)

Typically bikers ride in groups called pelotons. As bikers collect into groups, there is an opportunity for theCyber-Physical bicycle system to take advantage of this close proximity to improve safety, share processingload, and provide additional social functionality. For example, as a vehicle approaches from behind, the lastbicycle in the group can pass detection alerts forward to other bikers in the peloton. Similarly, a bicycle infront may perform roadway hazard detection and pass alerts back to those behind. Two levels of supportare required to achieve this. The first level of support required is functionality to automate the formationof CyberPelotons whenever bikers are in close proximity to each other. Beyond formation, the system mustalso support intra-group signaling for both high-priority alerts and lower-priority signals.

CyberPeloton formation involves a number of challenges that must be overcome. First, Cyber-Physicalbicycles must perform real-time proximity detection to determine when there are opportunities to formgroups. Second, once a group has been formed, fine-grained relative positioning must be determined1 andgroup consensus achieved to ensure that each member of a CyberPeloton agrees that it is a member andhas an accurate view of the group positioning topology. Third, support is required to handle load sharingfunctionality within a CyberPeloton. For example, in order to elect a member to perform roadway hazarddetection and alerting, the group must agree on the member that is at the front-most position in thegroup. The same applies when electing a member of the group to handle automated motor vehicle detection.Moreover, since bicycles in a group may change relative positions frequently, this is a continuous task thatmust be performed efficiently.

6.6 Automated Incident Detection

This paper has dealt with biker safety from an accident avoidance perspective. It is also important to considerbiker safety from an accident response perspective. Except for the most fortuitous situations where no injuryoccurs, an accident that involves a bicycle and a motor vehicle will likely require immediate medical attentionfor the biker. Considering that a substantial number of at-fault motorists were never even identified [4, 5], itis unlikely that a biker could rely on the motorist to react in the biker’s interests. A future goal of ours is toautomate the detection of such situations and react accordingly. Additionally, since prosecuting hit-and-runmotorists is difficult due to the inherent lack of actionable evidence (e.g., license plate number, make andmodel of car, etc.), we have the complimentary goal gathering such evidence.

Once a incident has been correctly detected, it must be reliably reported to the appropriate authorities.Since the Cyber-Physical bicycle system is equipped with wireless communication technology, notifying au-thorities in most situations is straight-forward. Considering the potential severity of the incident, though,

1Bikers frequently ride within 1 foot of each other, well within the error range of common GPS.

19

Page 20: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

we must also consider the exceptional case when traditional wireless communication fails (e.g., poor cellularsignal strength). In this case, the system may need to have alternate signaling methods available. Inves-tigating these methods is a necessary part of the scope of this topic. Finally, once the proper authoritieshave been contacted, the correct information must be communicated in a data sensitive manner. Since weare attempting to preserve evidence that would otherwise be unavailable to authorities, the authenticity andintegrity of the data is paramount.

6.7 Safe Route Planning

A more proactive measure to accident avoidance is to incorporate roadway safety metrics directly in bicycleroute planning. The goal would be to allow users to map out potential bike routes and then have the systemquantify the safety of their route and suggest alternate paths based upon safety criteria. This would allowa user to make a direct quantitative comparison of the relative safety between different bike routes. Today,this can only be determined through biker experience, documented opinion, and slim anecdotal evidence.

From a web services perspective, there currently exists a number of cycling-oriented services to allowbikers to form social groups [8, 12]. None of these services include safety as a first-order property in routeplanning. Although bikers may directly share qualitative experiences with each other regarding roadwaysafety and route planning, none of these services attempt to include quantitative analysis as a weightingfactor.

7 Related Work

The closest work to ours in both spirit and domain is the BikeNet [17] project. In this work, the authors utilizea suite of sensors to collect various types of environmental data from air quality to coarse-grained motorvehicle traffic flow (using an embedded magnetometer), again as an indicator of environmental conditions.Although this project was the first to suggest applying sensors to bicycles, their application domain wasbiker fitness, and did not target the equally challenging problem of biker safety. In the remainder of thissection we review work related to ours in the areas of video and audio-based detection techniques.

7.1 Video Detection Techniques

The current state of the art with respect to bicycle accident detection is the Cerevellum [9] digital rear-viewmirror product. Aside from providing a continuous video-based view of the situation behind a biker, it alsodetects when a bike is struck (using simple accelerometer measurements) and will store the last 30 secondsof video on local flash storage. It does not, however, contact authorities on behalf of the biker, nor does itprovide any digital signature of the evidence. It simply stores a small 30-second video, without any furtherprocessing.

Computer vision technologies have been successfully used in detection of moving objects, such as peopleand vehicle in many application domains. For example, in surveillance domain, many algorithms have beendeveloped for moving objects detection and tracking [18]. In a very closely related domain, many computervision systems have been developed to assist automobile drivers. For example systems have been developedand deployed to detect approaching cars in blind spots using cameras and radar sensors [31]. Systemshave developed to detect and recognize traffic signs [20] and to detect crossing pedestrians [19]. Severalsystems have also been developed to detect lanes, to detect the vehicle ahead in the same lane and keep asafe distance, e.g., [25, 35, 29, 24]. Integrated systems have also been developed for autonomous driving,e.g., [23]. In DARPA 2005 grand challenge, several teams competed on developing autonomous vehicles todrive on 211 km desert roads.

Despite that much effort spent in integrating vision systems in automobiles, almost nothing has beendone to develop vision systems for bicycles where it is highly needed for bikers’ safety. This is because it isquite challenging to develop a low cost and power efficient vision system for biker assistant.

20

Page 21: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

7.2 Audio Detection Techniques

For audio, little to no work has been completed in automated motor vehicle detection. SoundSense [27]utilizes smart phones to identify the context of a user’s life, as he performs daily tasks, keeping an automatedlog of those tasks and encounters with others. SoundSense draws on other work in the area of audiodiscrimination [26, 28, 33], seeking to automatically categorize audio events when detected, through machinelearning techniques applied to a well-know set of audio signal features.

Similar to using camera-based sensing for automated motor vehicle detection, audio techniques have notyet been applied to this domain. For the previous domains described, there is less of a real-time constraint.For CyberBike, the constraints are hard since we seek to minimize the detection latency. Additionally, falsepositives and negatives have substantially higher consequences in this new domain, and must be reducedto as close to zero as possible, for safety purposes. Finally, the system we propose cannot rely on user’sto categorize sounds after detection occurs. The system must be able to recognize a broad range of motorvehicle sounds to cover the most common cases.

8 Conclusion

We introduced the Cyber-Physical bicycle, a system that augments every-day bicycles with video and audiosensors, to perform automated rear-approaching vehicle detection. The purpose of this system is to directlyimprove the safety of bikers, as they travel on roadways by reducing the cognitive overheads presentlyassociated with roadway bicycle riding. To the best of our knowledge, our system is the first (i) to applyvideo and audio-based sensing techniques to bicycles on roadways, (ii) to apply advanced video and audioprocessing techniques to perform automated vehicle detection from a bicycle, and (iii) to propose novelcomputer vision techniques applied to the specific domain of biker safety.

The experimental evaluation of our prototype Cyber-Physical bicycle system demonstrates the feasibilityof this system, which is able to perform real-time rear-vehicle detection from a bicycle with high accuracy.Finally, we presented a number of future directions for this work, which further supports breadth andimportance of this problem domain.

References

[1] 11 Most Bike Friendly Cities in the World. http://www.virgin-vacations.com/site_vv/11-most-bike-friendly-cities.asp.

[2] A Quick History of Bicycles. http://www.pedalinghistory.com/PHhistory.html.

[3] My Central Jersey Traffic News Forum - Cyclist Buffer Laws Topic. http://www.mycentraljersey.com/apps/pbcs.dll/section?category=PluckForum&plckForumPage=ForumDiscussion&plckDiscussionId=Cat%3ac3e1733f-97fe-495c-963c-5d678129ebcaForum%3a5d77bdaf-be83-41e1-afc9-eed-560e032daDiscussion%3a972cdf6f-4dc6-4fb2-9f04-32b0c4c4f52c.

[4] Traffic Safety Facts 1996: Pedalcyclists. National Highway Traffic Safety Association (NHTSA), U.S.Dept. of Transportation, 1996.

[5] Traffic Safety Facts 2007: Bicyclists and Other Cyclists. National Highway Traffic Safety Association(NHTSA), U.S. Dept. of Transportation, 2007.

[6] 15 states require 3-foot clearance for bicycles - Biking Bis - Bicycle Touring and More web site., 2009.http://www.bikingbis.com/blog/_archives/2008/3/5/3549263.html.

[7] BatteryBar - Osiris Development web site., 2010. http://osirisdevelopment.com/BatteryBar.

[8] Bikely web site., 2010. http://www.bikely.com.

[9] Cerevellum web site., 2010. http://www.cerevellum.com.

21

Page 22: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

[10] HYmini — miniSOLAR - Green Power is in your Hand - MINIWIZ S.E.Dev Ltd. web site., 2010.http://www.hymini.com.

[11] LighLane web site., 2010. http://www.lightlanebike.com.

[12] [MapMyRide web site., 2010. http://www.mapmyride.com.

[13] Shimano DH-3N70 Dynohub - Peter White Cycles web site., 2010. http://www.peterwhitecycles.com/shimano3n70.asp.

[14] VLC Media Player - VideoLAN Project web site., 2010. http://www.videolan.org/vlc.

[15] S. Baker, D. Scharstein, and J. Lewis. A database and evaluation methodology for optical flow. InProceedings of the IEEE International Conference on Computer Vision, October 2007.

[16] S. Chattopadhyay, X. Luo, S. M. Bhandarkar, and K. Li. Fmoe-mr: content-driven multiresolutionmpeg-4 fine grained scalable layered video encoding. In Proceedings of Society of Photographic Instru-mentation Engineers (SPIE), January 2007.

[17] S. B. Eisenman, E. Miluzzo, N. D. Lane, R. A. Peterson, G.-S. Ahn, and A. T. Campbell. The bikenetmobile sensing system for cyclist experience mapping. In SenSys ’07: Proceedings of the 5th internationalconference on Embedded networked sensor systems, 2007.

[18] A. Elgammal and C.-S. Lee. Separating style and content on a nonlinear manifold. In IEEE ComputerSociety Conference on Computer Vision and Pattern Recognition (CVPR’04), june 2004.

[19] M. Enzweiler and D. Gavrila. Monocular pedestrian detection: Survey and experiments. 31(12):2179–2195, December 2009.

[20] D. Gavrila. Traffic sign recognition revisited. In Mustererkennung 1999, 21. DAGM-Symposium, pages86–93, London, UK, 1999. Springer-Verlag.

[21] B. K. P. Horn. Robot Vision. The MIT Press, Cambridge, Massachusetts, 1986.

[22] B. Hull, V. Bychkovsky, Y. Zhang, K. Chen, M. Goraczko, A. Miu, E. Shih, H. Balakrishnan, andS. Madden. CarTel: A Distributed Mobile Sensor Computing System. In SenSys’06, 2006.

[23] T. Jochem, D. Pomerleau, and C. Thorpe. Vision-based neural network road and intersection detection.page xx, 1995.

[24] H. Jung, B. Kwak, J. Shim, P. J. Yoon, and J. Kim. Precrash dipping nose (pcdn) needs pedestrianrecognition. 9(4):678–687, December 2008.

[25] S. Kenue. Lanelok: An algorithm for extending the lane sensing operating range to 100 feet. volume1388, 1990.

[26] D. Li, I. K. Sethi, N. Dimitrova, and T. McGee. Classification of general audio data for content-basedretrieval. Pattern Recogn. Lett., 22(5):533–544, 2001.

[27] H. Lu, W. Pan, N. D. Lane, T. Choudhury, and A. T. Campbell. Soundsense: scalable sound sensingfor people-centric applications on mobile phones. In MobiSys ’09: Proceedings of the 7th internationalconference on Mobile systems, applications, and services, june 2009.

[28] L. Ma, B. Milner, and D. Smith. Acoustic environment classification. ACM Trans. Speech Lang. Process.,3(2):1–22, 2006.

[29] J. McCall and M. Trivedi. An integrated, robust approach to lane marking detection and lane tracking.pages 533–537, 2004.

[30] J. Moszczynski. N.J. Bicyclists May Get More Room on Road. http://www.nj.com/news/index.ssf/2009/04/cyclists_hope_proposed_safety.html.

22

Page 23: Improving Bicycle Safety through Automated Real-Time ...research.cs.rutgers.edu/~smaldone/pubs/DCS-TR-665.pdf · Improving Bicycle Safety through Automated Real-Time Vehicle ... we

[31] M. Polychronopoulos, N. Mhler, S. Ghosh, and A. Beutner. Title of Book, pages 169–183. SystemDesign of a Situation Adaptive Lane Keeping Support System, the SAFELANE System, 2006.

[32] D. A. Ross, J. Lim, R.-S. Lin, and M.-H. Yang. Incremental learning for robust visual tracking. Inter-national Journal of Computer Vision (Special Issue: Learning for Vision), 77:125 – 141, May 2008.

[33] J. Saunders. Real-time discrimination of broadcast speech/music. In ICASSP ’96: Proceedings of theIEEE International Conference on Acoustics, Speech, and Signal Processing, pages 993–996, Washing-ton, DC, USA, 1996. IEEE Computer Society.

[34] M. Werlberger, W. Trobin, T. Pock, A. Wedel, D. Cremers, and H. Bischof. Anisotropic Huber-L1 opticalflow. In Proceedings of the British Machine Vision Conference (BMVC), London, UK, September 2009.

[35] B. Yu and A. Jain. Lane boundary detection using a multiresolution hough transform. Image Processing,International Conference on, 2:748, 1997.

23