6

Click here to load reader

Tracking pitches for broadcast television - Computerbaseball.physics.illinois.edu/trackingbaseballs.pdf · Tracking Pitches for Broadcast Television I n baseball, a pitcher’s fame

Embed Size (px)

Citation preview

Page 1: Tracking pitches for broadcast television - Computerbaseball.physics.illinois.edu/trackingbaseballs.pdf · Tracking Pitches for Broadcast Television I n baseball, a pitcher’s fame

0018-9162/02/$17.00 © 2002 IEEE38 Computer

Tracking Pitches forBroadcast Television

I n baseball, a pitcher’s fame and fortunedepend on his mastery of the strike zone.Pitches that pass outside the strike zonecount as balls and can be safely ignored.Those that pass through it untouched, how-

ever, count as strikes, three of which will retire thebatter to his team’s dugout. Players, fans, andsports journalists thus have an intense interest incompiling statistics about these pitches—as do theumpires who determine each pitch’s status when itcrosses the plate.

During the 2001 major league baseball season, thestrike zone received special attention when officialsdecided to enforce the game’s original strike zone def-inition, placing the zone’s upper limit between thebatter’s shoulders and belt. In the past, major-leagueumpires had rarely called a strike above the belt.League officials and journalists thought that the effectof enforcing the original definition would be so sig-nificant it might change the hierarchy of hitters andpitchers. Further, 2001 turned out to be a particu-larly exciting year for baseball as Barry Bonds pur-sued—and ultimately surpassed—Mark McGuire’ssingle-season home run record set in 1998.

These developments made tracking pitches accu-rately more important than ever. Tracking the flightof a pitch during a live broadcast presents two majorchallenges, however: speed and image-processingreliability. Speed is an issue because ensuring rapidcalculation of the trajectory practically requires real-time processing of the 60-fields-per-second video.

Ensuring image-processing reliability, on theother hand, requires overcoming several obstacles.

During a baseball game, dramatic changes in light-ing conditions and the movement of objects andplayers can result in a shifting pattern of light andcolor that makes it especially difficult to track apitched ball. Further, several ballparks have a netin place behind home plate, which contributes fur-ther to the visual clutter that the image-process-ing system must filter out when tracking thebaseball.

Meeting these challenges required developing acomplex system that fuses high-end computergraphics with a sophisticated algorithm for calcu-lating flight trajectories. The ESPN K Zone systemuses computer-generated graphics to create ashaded, translucent box that outlines the strike zoneboundaries for viewers. Behind the flashy graphics,K Zone—named after a synonym for the strikezone—is a sophisticated computing system thatmonitors each pitch’s trajectory.1

K ZONE TAKES SHAPEIn February 2001, ESPN contracted with Sport-

vision to build a system for analyzing baseballpitches during its Major League Baseball broad-casts. ESPN wanted a system that would determineelectronically, within one to two centimeters,whether each pitch qualified as a strike or a ball.The system would then draw a representation ofthe strike zone on the TV screen, superimposed overthe replayed broadcast video, to clearly show thepitch’s status. ESPN chose Sportvision for the pro-ject because of the company’s track record in graph-ically enhancing sports broadcasts.

AndréGuéziecTriangle Software

A cable network’s desire to enhance its baseball telecasts resulted in K Zone,a computerized video tracking systemthat may have broader applications.

C O M P U T I N G P R A C T I C E S

Page 2: Tracking pitches for broadcast television - Computerbaseball.physics.illinois.edu/trackingbaseballs.pdf · Tracking Pitches for Broadcast Television I n baseball, a pitcher’s fame

System overviewFigure 1 shows K Zone in action during a televi-

sion broadcast. ESPN insisted that the effect appearon the program video as an integral part of the scene,not as a separate graphical animation. To fulfill thisrequirement, the developers minimized the graphicsso that they would not obscure any part of the game.

The overall broadcast enhancement system usesthree subsystems to produce the final televisedgraphics:

• The camera pan-tilt-zoom encoding subsystemcalibrates the broadcast cameras in real time.

• The measurement subsystem detects the base-ball’s trajectory, measures the batter’s stance,and determines if the pitch is a strike or a ball.

• The graphic overlay subsystem uses these mea-surements to produce the televised graphics.To draw them in the proper position, this sub-system needs the real-time calibration data thatthe camera subsystem provides.

The trajectory component, which consists of threePCs connected to three video cameras, tracks apitched baseball’s flight toward the strike zone. Twocameras observe the baseball, while the third observesthe batter to provide proper sizing for the strike zone.

For calibrating the broadcast cameras, techni-cians install an encoder on each camera that mea-sures the pan and tilt angles, zoom voltage, andzoom extender positions. The encoders collect thesemeasurements 30 times per second and transmitthem to the graphic overlay subsystem.

The graphic overlay subsystem renders a graphicand superimposes it on the broadcast video. Thisgraphic consists of two video streams, the fill, whichcontains the actual graphic, and the key, which con-tains the transparency map that indicates the videopixels the graphic affects. These two streams areinput to the linear keyer, a piece of video equipmentthat overlays the graphic on the broadcast video.The graphic overlay subsystem uses an SGI O2 com-puter to draw a three-dimensional representationof the strike zone in the position that the broadcastcamera’s pan, tilt, and zoom parameters specify.

Although Sportvision had used the camera andgraphic-overlay systems in their broadcasts for sev-eral years, using them with K Zone required mod-ifications. The measurement subsystem had to bebuilt from scratch.

Measurement subsystemK Zone’s measurement system uses two Pentium

4 PCs running Windows 2000, linked to two video

cameras that observe each pitch. An operator usesa third camera and PC to locate the strike zone’stop and bottom boundaries. All these componentsfit conveniently in one short equipment rack.

During operation, each PC processes the videofor one camera in real time. The processing uses afour-way multithreaded software architecture. Onethread reads the video frame into memory, a sec-ond displays the video, a third handles the imageprocessing, and the fourth writes the video to disk.

We tested the pitch-tracking system extensivelybefore using it in its broadcasts. Technicianschecked various camera locations for tracking thebaseball, selected views that permitted the mostreliable detection, and refined the tracking algo-rithm. These tests took place over several weeksearly in the season at baseball games played inOakland, Minneapolis, and New York.

TRACKING CONSTRAINTSTo accomplish pitch tracking, the developers

needed to deal with four primary constraints.

PerformanceFull-resolution digital NTSC (National Tele-

vision System Committee) or PAL (phase alternat-ing line) video requires 270 Mbits per second ofbandwidth. Importing, displaying, and exportingthis data in real time takes several passes throughthe personal computer’s PCI bus, stretching itnearly to capacity. Doing all this data transmissionin real time requires carefully optimized softwareengineering. At the very least, multithreading isessential to keep the CPU working on the video-processing pipeline while waiting for the next videofield or frame to arrive. The system then decom-poses the video-processing pipeline into tasks exe-cuted independently in a thread-safe manner.

Real-time operationAlthough ESPN planned to use the system for

replays, we designed the image-processing pipelineto work in real time to keep pace with the videoframe rate, with a delay of two seconds. Such a design

March 2002 39

Figure 1. K Zone during a televisedgame. The pitch-tracking effect is an integral andunobtrusive part of the telecast.

Page 3: Tracking pitches for broadcast television - Computerbaseball.physics.illinois.edu/trackingbaseballs.pdf · Tracking Pitches for Broadcast Television I n baseball, a pitcher’s fame

40 Computer

allows using the effect live, provided the programreceives a two-second or longer delay when broad-cast. Sports broadcasts commonly apply such delays.

Even when used for replays alone, the pitch-tracking system still needed to process videoquickly. Creating a successful replay required exe-cuting several steps in rapid succession. In additionto processing the video, using the system requiredcoordinating with ESPN’s television operators tocue the appropriate footage. So pressing were theseconstraints that the show’s director would period-ically cancel replays for lack of time.

ReliabilityImage processing and computer vision have been

well-established academic fields since the 1970s.Although many academics have published appli-cations in this field, few have been highly reliableand successful in practice. Academic emphasistends to be directed more toward pure innovationand mathematical elegance—demonstrated with afew carefully chosen test cases—than toward reli-ability. Partially because taking measurements onlyseconds before airing them on television can beextremely stressful, engineers who build applica-tions for commercial broadcasting tend to empha-size reliability and repeatability.

Developing the image-processing system requiredmeeting the particularly significant challenge ofdesigning an algorithm that would work in all pos-sible lighting conditions for an event staged out-doors. The image-processing system needed tofunction in sunny or overcast lighting, during theday and at night, on both well-lit and shadowedsubjects, on scenes composed of different viewingangles or involving markedly different back-grounds, and on images filtered through a fore-ground net.

Efficient detectionDeveloping a successful tracking algorithm pro-

vided the key to making K Zone work. To trackpitches effectively, the system needed to isolate theball and follow it throughout its trajectory. Webegan with color cameras working in the standardand less costly interlaced mode at 30 Hz, which

meant that the video would contain 60 fields—orhalf frames—per second. The interlacing, which isthe television standard, displays two fields in alter-nating lines on a frame and thus represents two dif-ferent moments in time.

Each camera has a field of view that covers abouthalf the baseball’s flight. As a consequence of therelatively wide field of view, the baseball’s imageconsists of only a few pixels. Depending on theview, background, foreground, and other factors,the baseball can appear as no more than two pix-els after detection. In one view, the ball passesnear—and often over—the white foul line, creat-ing a white-on-white image.

Several moving objects and shadows could bemistaken for the ball as well. The home plateumpire, catcher, and batter typically stay immobile,then move swiftly and precisely when the ball ispitched, while the computers are busy detecting it.Baseball uniforms typically have white or graypatches, such as a white handkerchief hanging fromthe umpire’s pocket. Helmets exhibit specular high-lights that can be mistaken for the ball in someviews.

Figure 2 shows how a later-abandoned image-processing algorithm that uses multiresolution pat-tern matching created a misleading image in thevideo sequence taken from the centerfield position.Compare this image with Figure 3, generated by analgorithm that I developed. The system draws thesuccessive detected positions of the baseball,whether successfully tracked or not, superimpos-ing colored squares on a corresponding videosequence’s images. In Figures 3b and 3c, the stillframe represents two fields and shows two succes-sive baseball positions, one for each field. The base-ball locations in the images appear as small greenblobs of just a few pixels, with the interlacing caus-ing the color to skip every other line.

TRACKING ALGORITHMInstead of using pattern matching, the K Zone

tracking algorithm exploits the kinematic propertiesof the baseball’s flight. The first step, however, is toqualify a potential baseball position in the image.

A potential ball position corresponds to a num-ber of adjacent pixels, or blobs, that satisfy simplecriteria in terms of size, shape, brightness, and color.Specifically, the algorithm eliminates anything thatis too colorful, looking instead for a grayish shapewith perhaps a little red dirt or green from the grasson it. More importantly, these adjacent pixels mustbe significantly different from what occupied thatsection of the image in previous fields.

Figure 2. Broken trajectory mapping.This ultimatelyunsuccessful algo-rithm seeks thebaseball pattern in the color image,resulting in thescattered ball posi-tions that the redsquares denote. Thealgorithm becameineffective whenlighting conditions,the field, and teamcolors combined tomake parts of theuniforms and back-ground look morelike a baseball thanthe baseball itself.

Page 4: Tracking pitches for broadcast television - Computerbaseball.physics.illinois.edu/trackingbaseballs.pdf · Tracking Pitches for Broadcast Television I n baseball, a pitcher’s fame

Background subtractionTo estimate the probability of obtaining a given

pixel value, the tracking system draws samples ofintensity values from previous frames or fields.2 AGaussian distribution works well for a backgroundpixel that does not change over time. To handlevariations over time, the system uses a mixture ofGaussian distributions centered about recentlyobserved pixel values for a given location. Pitchtracking updates these models continuously tomonitor changes in the background caused by mov-ing shadows and lighting variations such as switch-ing from natural to artificial light. If the currentpixel deviates significantly from a predicted value,the system registers the change as motion. The var-ious pixel locations a ball occupies when it passesthrough the camera’s field of view generally satisfythis criterion.

Size, shape, color, and background differencingcriteria—although significantly constraining—can-not by themselves provide enough data to distin-guish a pitched ball from various other movingelements in a video sequence. Specifically, depend-ing on the view, perhaps 10 to 15 blobs of pixelssatisfy these criteria at a given time, with at mostone corresponding to the ball’s correct location.Figure 3, for example, shows all ball candidates asgreen blobs of pixels. Narrowing the cameras’ fieldof view could resolve this problem at least partially.But doing so would result in more cameras show-ing less of the trajectory so that they could tracklarger and thus less ambiguous pitched balls.3 Evenif there had been time to implement this upgrade,it would have increased the system’s cost and com-plexity significantly.

Finite state machineThe system does not necessarily select the cor-

rect blob instantly from among all candidates.Rather, the selection depends on the past and futureblob sets, and the consistency the system can findamong them. The selection algorithm uses a finitestate machine with two states. In the first state, thealgorithm looks for physically plausible trajecto-ries. It checks the angle, the velocity in pixels perfield, and the trajectory’s deviation with respect toa locally linear fit of the samples. If the tracking sys-tem can match enough candidates in successivefields, they serve as seeds for starting a trajectoryin the image’s two-dimensional plane.

To acquire a plausible trajectory, the systemdelays the image processing by a small number offields so that the algorithm can look both aheadand back a few fields—corresponding to a time

lapse of perhaps one tenth of a second—for poten-tial ball positions that form a consistent trajectory.

Once the algorithm has seeded a trajectory, itworks in the second state, in which it tests all thepotential blobs for a fit with the existing trajectory.The actual mathematics of the trajectory use aKalman filter, as the “Using Kalman Filtering toTrack Trajectories” sidebar describes.

Since the process uses regular interlaced NTSCvideo, a still frame contains two successive posi-tions of the baseball, shown as green blobs in thefigures. Background and foreground variations cancomplicate trajectory acquisition. Figure 3b, forexample, shows how the foregrounds with net orno net, and the backgrounds with grass or dirt, varydramatically within the same trajectory.

DETERMINING THREE-DIMENSIONAL TRAJECTORIES

Our two pitch-tracking computers work as ateam to compute each final trajectory. One observesthe view from high home, the other from high firstbase, exchanging the two-dimensional positions

March 2002 41

Figure 3. Successfultrajectory mapping.The final algorithmchooses the correctball positions,shown as redsquares, fromamong a variety ofcandidates, shownas green patches,whose shape, color,and brightness patterns resemblethose of thebaseball. K Zoneprovided three viewsof each pitch: (a) batter close-up(for sizing the strikezone only), (b) highhome, and (c) highfirst base. In thesestills, used for real-life detection, theshutter speed nec-essary to keep theball’s image fromblurring causessome of the shots to be dark.

(c)

(b)

(a)

Page 5: Tracking pitches for broadcast television - Computerbaseball.physics.illinois.edu/trackingbaseballs.pdf · Tracking Pitches for Broadcast Television I n baseball, a pitcher’s fame

42 Computer

the tracking algorithm computes in each view inreal time over a local Ethernet connection. K Zoneuses synchronized time-code generators to inscribea field-accurate time code into each field of eachcamera view. The pitch-tracking computers use thetime codes to tag each two-dimensional positionunambiguously.

One of the two pitch-tracking computers inter-sects two lines of sight from each camera to com-bine two two-dimensional positions that correspondto the same time code into a three-dimensional posi-tion. More specifically, the computer calculates thepoint of closest approach between the two straightlines. Each two-dimensional position, or pixel, in acamera view can be associated with a straight linein three-space that essentially depicts the path of thephotons that hit that pixel. Figure 4 illustrates theprocess of using intersecting line-of-sight pairs tolocate a baseball’s successive positions. Associatingeach pixel with a line of sight is called camera cali-bration4 and presents several challenges in terms ofaccuracy, especially in a large environment such asa ballpark.

To determine the final trajectory, the system feedssuccessive three-dimensional positions into anotherKalman filter.

The third computer and camera provide a close-up view of the batter. A TV camera operator uses ajoystick to locate the strike zone’s top and bottomon the live video. Since the rule book says a strikeoccurs when any part of the baseball enters any partof the strike zone, K Zone computes the intersectionof the strike zone’s volume—a pentagonal prism—with a cylinder that has the same radius as the base-ball, centered on the computed trajectory. Thetracking system reports the intersection as a strikeand the absence of an intersection as a ball. In eithercase, the system reports the baseball trajectory’spoint of impact with the front plane of the strikezone and draws this intersection on the TV screen.

USER AND VIEWER FEEDBACKK Zone launched on 1 July 2001 during ESPN’s

Sunday Night Baseball, and the network used it toaugment every edition of the program aired thatsummer. Estimates from ESPN’s published ratingsindicate that about 13 million viewers tuned intothe show each week and watched the graphical rep-resentation of the strike zone as each pitch eitherhit the zone for a strike or missed it for a ball.

Even though replays presented only a small frac-tion of the 300 to 400 pitches thrown each game, the

Invented in 1960 by Rudolph E.Kalman, Kalman filtering1 is a commonlyused technique for removing measure-ment errors and estimating a system’svariables. A vector represents each systemparameter and measurement. Linearequations must describe the system andmeasurement evolutions over time. TheKalman filter provides optimal estimatesof the system parameters, such as posi-tion and velocity, given measurementsand knowledge of a system’s behavior.

In general, the Kalman filter assumesthat the following two relations candescribe a system:

xk = Ak xk − 1 + wk (1)zk = Hk xk + vk (2)

xk is the state vector, such as a positionand velocity, perhaps an acceleration orother parameters, while zk is the measure-ment, such as a position. x0, wk, and vk aremutually uncorrelated vectors, and wk

(process noise, or process evolution) andvk (measurement noise) are white noisesequences. The first equation determinesthe evolution of the state over time, andthe second relates measurement and state.

Once this is established, a recursivealgorithm estimates xk optimally.2 In thiscase, “optimally” relates to minimizingthe mean squared error in Equation 2over all the measurements.

The Kalman filter algorithm uses the fol-lowing notations: := designates the assign-ment operator, P is the covariance matrixassociated with the state, Q is the processnoise covariance matrix, R is the mea-surement noise covariance matrix, and Kis the Kalman gain matrix. The algorithmconsists of two main steps, as follows.

The time update or prediction step:

x := A x; (3)P:= A P AT + Q; (4)

and the measurement update step, whichadapts the state to the new measurementvalue:

K1 := H P HT + R; (5)K := P HT K1

− 1; (6)x := x + K (z − H x); (7)P:= (I − K H) P; (8)

After a time update, the algorithm canreject a measurement if the predicted

value in Equation 3 is too different fromthe measurement z, given the currentuncertainty on the prediction: H P HT +R. Also, if the algorithm can choosebetween several potential measurementsto feed the filter, with only one being thecorrect measurement, the correct choicewill probably be close to the predictedvalue within the uncertainty region.

Typically, P gradually decreases as thealgorithm incorporates more measure-ments: Confidence in the state builds up.Equation 7 shows that if K is large—which is the case if R is small, meaningthat there is little noise in the measure-ments—the new measurement z isweighted heavily. Instead, if K is small,the value the current state x predicts hasa higher weight. Thus K works as a gain.

References1. G. Welch and G. Bishop, “An Introduction

to the Kalman Filter,” http://www.cs.unc.edu/~welch/kalman/kalmanIntro.html.

2. L. Levy, “The Kalman Filter: Navigation’sIntegration Workhorse,” http://www.cs.unc.edu/~welch/media/pdf/Levy0997_kalman.pdf.

Using Kalman Filtering to Track Trajectories

Page 6: Tracking pitches for broadcast television - Computerbaseball.physics.illinois.edu/trackingbaseballs.pdf · Tracking Pitches for Broadcast Television I n baseball, a pitcher’s fame

system tracked every pitch with an extremely lowfailure rate. In this context, failure means that thesystem did not accurately detect the pitched baseball’strajectory. During the entire season, at worst the sys-tem missed only a handful of pitches per game, andfrequently none at all. Significantly, operator erroraccounted for most of the few reported failures.

ESPN used the system intensively, showing 17replays in K Zone’s debut game, then 20 to 30replays per game thereafter. Viewers and criticsalike responded positively to K Zone’s visual effectsthrough messages posted to various Internetforums. Several critics went further, urging ESPN touse the system on controversial umpire calls.Although the network rarely used K Zone in thiscapacity early in the season, it did so occasionallylater in the season.

BROADER APPLICATIONSA potential future application for the technolo-

gies used in K Zone may lie in computerized med-ical-image analysis. The medical literature docu-ments that radiologists sometimes make mistakes inreading x-rays and other imaging studies—as doexperienced, well-trained, and dedicated umpireswhen watching a baseball flying straight towardthem at 90 miles per hour.

Although much progress has been made lately inthe automated processing and understanding ofmedical images, the process of examining a chestx-ray, even digitally, remains surprisingly similar towhat it was several decades ago. Healthcareproviders could apply some of the computer visiontechniques used to detect a baseball to these analy-ses, either directly or in a closely related form.Image differencing, which relates to backgrounddifferencing, is commonly used in digital subtrac-tion angiography, for example, to show a patient’sarterial network after injection of a contrast mate-rial. Likewise, technicians could compute ananatomical shape’s motion and trajectory on imagesequences to determine how a clinical conditionevolves over time or in response to treatment.

C omputer vision is coming of age, after decadesof mostly academic pursuits. We finally havethe computational power—at a reasonable

price—and proper expertise to apply this technol-ogy to challenging technical problems. However,while solving some problems might seem easy intu-itively, in reality, doing so may be tremendously dif-ficult. In particular, the visual nature of these issuesmay give a false impression of simplicity: Humansare typically very skilled at pattern recognition, when

identifying faces, for instance, whereas computersobviously have none of these human perceptualskills built in. K Zone, built on a tight deadline thatallowed only four months for its development, aug-mented existing technology to create a sophisticatedand reliable system that has proven commerciallyviable and a valuable enhancement for sports fans.Further, ESPN is so pleased with the technology itis seeking an Emmy nomination for K Zone. Thisdevelopment model may well prove effective infuture computer vision projects, regardless of theapplication domain. �

AcknowledgmentsI thank Sportvision’s Rick Cavallaro, Mike

Cramer, Matt Lazar, Jim McGuffin, Alon Moses,and Marv White; J.R. Gloudemans; and Sport-vision, ESPN, and Reality Check.

References1. A. Guéziec, “Tracking a Baseball Pitch for Broadcast

Television,” http://www.trianglesoftware.com/pitch_tracking.htm.

2. A. Elgammal, D. Harwood, and L. Davis, “Non-parametric Model for Background Subtraction,”http://fizbin.eecs.lehigh.edu/FRAME/Elgammal/bgmodel.html.

3. G. Pingali, Y. Jean, and A. Opalach , “Ball Trackingand Virtual Replays for Innovative Tennis Broad-casts,” Proc. 15th Int’l Conf. Pattern Recognition(ICPR), IEEE CS Press, Los Alamitos, Calif., 2000.

4. O. Faugeras, Three-Dimensional Computer Vision,MIT Press, Boston, 1993.

André Guéziec is the founder and CEO of TriangleSoftware (http://www.trianglesoftware.com). He isthe main developer and designer of baseball track-ing in K Zone. His interests include image process-ing and computer vision, notably applied to medicalimaging, 3D shape modeling, and processing (par-ticularly simplification), as well as modeling, simu-lation, and animation of road traffic. Guéziecreceived a PhD in computer science from the Uni-versity of Paris at Orsay, where he specialized inmedical image analysis. He is a senior member ofthe IEEE and holds several US and internationalpatents. Contact him at [email protected].

March 2002 43

Figure 4. Locating abaseball’s succes-sive positions. Tolocate a baseball’sposition in threedimensions, K Zonefinds the point ofclosest approachbetween the twovideo cameras’ linesof sight: high abovehome plate and highabove first base.