8
Director’s Corner Mrs. Gina Reichert, Director The first semester of the 2014-15 school year has come to a close, and the students and staff of the Pittsburgh Gifted Center have demonstrated some amazing accomplishments! Our school year began with our Back-to-School Event on September 3. Over 400 people attended this event and families had the opportunity to meet their teachers, tour the facility, and participate in fun activities. Families joined us again for our annual Fall Harvest in October and participated in our first PSCC meeting of the year. PGC Saturday Enrichment Classes began on September 27 in areas including robotics, ceramics, book making, game design, and duct tape challenge. Students in Mr. Nash’s Facts of Life course teamed up with the University of Pittsburgh and the Carnegie Science Center to conduct a Pittsburgh Drinking Water Microbiome Project. Students provided water and the Carnegie Science Center used these water samples and the data collected as part of a new exhibit called "H2O." Students in Ms. Foster-Wilhelm’s Environmental Chemistry class participated in the School Flag Program in cooperation with GASP (Group Against Smog and Pollution) and SPAQP (Southwestern Pennsylvania Air Quality Partnership) helping to inform and notify the student body about outdoor air quality conditions. In October, PGC was Passionately Pink honoring Breast Cancer Awareness Month. As a show of our support, students and staff wore pink ribbons during the week of October 27. In November, GIEP Conferences took place from with 85% of parents participating! In December, our first student-developed and created Swap Meet and Flea Market took place. At this event, visitors had the opportunity to purchase products created by our "student entrepreneurs" from Ms. Gomez' Kids Company class. All proceeds from the sale benefitted local charities. Students from Ms. Foster-Wilhelm's Friends of the Earth class planned a Swap Meet recycling gently used items. Remaining items were donated to charity. Also, students from Ms. Blackwell’s Ceramics classes and the Saturday Empty Bowls Club participated in the Pittsburgh International Airport Holiday Tree Decorating Contest. The PCG tree received over 25,000 votes and won $400 which was donated to the Greater Pittsburgh Food Bank. In January, in celebration of MLK Day and Black History Month, the PGC sponsored a Creativity Contest. Students submitted entries which described or illustrated the positive impact an African American (contemporary or historical) had on their life. February 2 nd through 6 th saw all of the Gifted Center students participating in “School for a Day” at the Carnegie Science Center. All classes conducted hands-on activities and learning. New communication methods have been used including Constant Contact parent emails, Facebook, and Twitter. Continued on Page 3 The Pittsburgh Public School District is an equal opportunity educational institution and will not discriminate on the basis of race, color, national origin, gender, sexual orientation, age or disability, in its activities, programs or employment practices as required by Title II of ADA, Title VI, Title IX and Section 504. It is the policy of the Pittsburgh Public School District to make all services, programs and activities available and to provide reasonable accommodations to persons with disabilities. For more information regarding accommodations, civil rights or grievance procedures, contact: Ms. Susan Sinicki, Manager of Employee Relations, Office of Employee Relations, Pittsburgh Public School District, 341 S. Bellefield Avenue, Pittsburgh, PA 15213-3516; Phone: (412) 622-3691 (voice/TTY/TDD); Fax: 412-622-3691. Pittsburgh Public Schools Excellence in Education Volume 13, Issue 1 www.pghboe.net First Semester 14-15

Recognition of Tennis Strokes using Key Posturesdoras.dcu.ie/15428/1/damien_connaghan-issc_paper.pdf · ranked tennis players, one of which is the current senior rank one player in

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

ISSC 2010, UCC, Cork, June 23–24

Recognition of Tennis Strokes using Key Postures

Damien Connaghan, Ciaran O Conaire, Philip Kelly, Noel E. O’ConnorCLARITY: Centre for Sensor Web Technologies

Dublin City University, Dublin 9, Ireland

E-mail: [email protected] [email protected]

[email protected] [email protected]

*This work is supported by Science Foundation Ireland under grant 07/CE/I1147.

Abstract — In this paper we describe an approach for automatic recognition of tennisstrokes using a single low cost camera. Professional tennis is played at high speed sothe ability to classify tennis strokes on camera is hindered by the rapid movement ofthe players. We have developed an accurate recognition system which can automaticallyindex tennis strokes from video footage. We aim to evolve this system so that meta data,such as time codes and descriptions of the strokes played, can be automatically indexedfor a training session or a match. This level of indexing would provide an excellentfoundation for the development of next generation sports coaching systems. The aim ofthis paper is to coarsely classify the main stokes played in tennis, i.e. a serve, forehandor backhand. The proposed approach is evaluated against a real-world dataset, obtainedfrom elite players in a competitive training match.

Keywords — action recognition, tennis

I Introduction

As video cameras become cheaper and easier to in-stall, it becomes possible to consider their use inmany domains, where until recently it would havebeen considered impractical. One such area wherethere is an increasing interest in the use of lowbudget infrastructure is in sports coaching, wheretechnology can give an athlete a competitive ad-vantage. In collaboration with Tennis Ireland [1],the national governing body for the sport of ten-nis in Ireland, we have developed the TennisSensesystem at their coaching headquarters [2]. Thesystem can collect large volumes of video but in-dexing this video into meaningful segments is verytime consuming for tennis coaches. It is thereforeparamount that this system can automatically in-dex the video into meaningful segments. Accurateshot classification provides a ideal foundation forvideo indexing as annotating the shots played willlead to finer grained video indexing.

A major research challenge is to create an ac-curate tennis stroke recognition system, which canautomatically annotate a variety of tennis strokes.This recognition system should identify key pos-tures in each tennis stroke and then use those

postures to classify one stroke from another e.g.classify serves from forehands and backhands. Itshould be noted that we do not perform classi-fication at the finer granularity level, such as aflat serve, top spin serve, slice serve, top spin-sliceserve or a kick serve. In this paper we report onour first development of such a system.

We recorded three sessions from three highranked tennis players, one of which is the currentsenior rank one player in Ireland. Each sessioncontained a series of serves, forehands and back-hands. Using the data, we successfully trained arecognition system to classify an input stroke asa serve or not. This was achieved by calculatingthe distances of the input stroke against a trainingset consisting of serves. Previous work in this areainvolves using Motion History Images and MotionEnergy Images to articulate what motion the tar-get is engaged in [3]. Using motion images how-ever is not well suited to a scenario where the ac-tions, rotations and movements of the tennis play-ers torso and limbs are rapid and numerous, so inour work we extend this previous approach.

It must also be highlighted that this success wasaccomplished on a single low cost IP camera, whichoffers great potential for the development of low

budget sports coaching tools.

II Related Work

In our initial experiments to recognise tennisstrokes, we created Motion History Images (MEI)and Motion Energy Images (MHI) of all the strokesas described in [3]. The process of creating motionimages extracts the player as foreground from eachbinary image in a given sequence and joins all theforeground regions together into a single frame torepresent the players movements over a given ac-tion. However we found that due to the movementand rotation of the player, the timing and precisemovement information was lost. For this reason wedecided to extract the maximum amount of infor-mation from each frame to make a more informeddecision.

Another approach uses broadcast video to clas-sify tennis strokes [4]. Zhu et. al. were able torecognise player actions based on motion analysis.To achieve this, a relationship was established be-tween the movements of different body parts andthe regions in the image plane. However, this ap-proach only recognises two basic actions, a leftswing and a right swing, whereas we want to clas-sify serves from backhands and forehands, so thisapproach did not suit our application.

III Real-time player extraction

a) Background subtraction

Background subtraction is a well known techniquefor recognising moving foreground regions in com-puter vision as used in [5] [6] [7] and many more.This technique assumes a static camera is used andthat image features, such as colour intensity oredge gradient information, of foreground objectsdiffer to that of the background.

A basic method of background subtraction is todetect pixels belonging to foreground objects bydetermining if the difference between pixels in thecurrent frame, fi , and the corresponding pixels ina previous image consisting of a static background,bi , are above a user defined threshold t. A pixel,(x, y), is declared as foreground if

|fi(x, y)− bi(x, y)| > t (1)

otherwise it is declared as background. In thiswork t is chosen empirically and background sub-traction was used to create a binary image whichcontained the player as foreground. Since we in-tend to use a binary image of the player as inputto the next stage of our algorithm, we simply ap-plied further processing techniques on the imageafter background subtraction to smooth the fore-ground target. These techniques are discussed inthe following section.

(a) Raw Frame (b) BackgroundSubtraction

(c) After morpho-logical open func-tion

Fig. 1: Processing the video frames and smoothing theforeground results in a clean result.

b) Foreground Post-Processing

The first pre-processing step is to clean up anynoise within the foreground. This is done usinga morphological open function [8]. This removesany small holes in the target and smooths out anynoise . This approach also helps to inflate smallfeatures such as moving arm or leg segments.It was also necessary to remove the tennis ball

from the foreground. This was achieved by search-ing the binary image for any small blobs and re-moving them if they were smaller that a predefinedthreshold. The processing steps are illustrated inFigure 1.

IV Identifying Key Postures for theTraining data

Each tennis player has his/her own style for eachstroke, but there are also some common character-istics among different styles. One such example ofa common characteristic between different stylesof serve, is that the point of contact between theball and racket occurs at an altitude greater thanthe height of the player.By visual observation, we studied a variety of

shots and identified three key player poses amongall styles of serve, backhands or forehands. Fig-ure 2 shows the three key postures common toall serves. Similar key postures were identified forbackhands and forehands.For each stroke in the training set we manually

selected the three key postures to train the recog-nition system. This process was applied to serves,backhands and forehands.

V Developing a Recognition Framework

a) Training Data

After the video data was captured, we convertedthe entire video sequence into individual images.From the images, the strokes were manually seg-mented into serves, forehands and backhands foreach player captured. Automatic segmentation isof course possible and is targeted for future work.For a given stroke, we had an array of binary im-

ages which make up the shot played by the tennisplayer. The number of frames which can represent

a tennis stroke vary in length but within this arrayexists the key postures which can be used to iden-tify the type of stroke played as shown in Figure2.

b) Statistical Comparisons of Strokes

For each binary image in the group that makes up astroke, we compute statistical descriptions of theseimages using 7 Hu moments [9]. Hu moments areknown to offer reasonable shape discrimination ina translation and scale invariant manner. Once wehave the Hu moments for each image in the inputsequence, we can classify the input stoke played.

When the Hu moments are calculated, we an-alyze the values through a collection of serves tofind the best key postures for representing a giventennis stroke such as a serve. The most suitablekey postures would be the most common HU Mo-ments across a collection of serves.

To measure the similarity of the Hu moments weused Mahalanobis distance [10]. This gives us thedistance between the mean of the training set andthe input frame. Mahalanobis distance is based onrelationships between variables by which differentpatterns can be identified and analyzed. This met-ric determines similarity of an unclassified sampleset to a classified one. It differs from Euclideandistance in that it takes into account the relation-ships of the data set and is scale-invariant. Forthis reason, Mahalanobis distance is ideal for oursystem, as we intend to classify tennis strokes per-formed by different players.

Each frame in the input vector is compared tothe 3 key posture sets via Mahalanobis distance.A similarity matrix is constructed from these com-parisons where row 1 consists of all the input vec-tor comparisons to key posture 1 of the trainingset. Similarly rows 2 and 3 consist of comparisonsto key postures 2 and 3 respectively. To get thebest match for each key posture, we find the short-est path through each row of the matrix. A fewsimple rules are applied in that once the lowestmatch for key posture 1 was identified at positionK, we assumed that the closest match to key pos-ture 2 could only exist at location K+1 or greater.

After analyzing the moments from a series ofserves, we identified a suitable threshold value,which could be used to classify the stroke in thetraining set from dissimilar input strokes. By cal-culating the mean of the distances for the trainingset of serves, we set the maximum distance thresh-old for a candidate input serve at 12.5.

We experimented with different sizes of train-ing sets but 20 produced the best results. This isbecause the larger the size of the training results,the greater the variance in the tennis stroke. Wefound that ten strokes from two players gives agood representation of a stroke.

(a) Throw ball (b) Mid-Point (c) Post-Contact

Fig. 2: Key postures of a tennis serve, top row containsraw frames and bottom row contains the corresponding

foreground.

VI Experimental Results

a) Visual Sensing

The video infrastructure consisted of a single IPcamera with pan, tilt and zoom (PTZ) capability.The camera used was positioned behind the cen-ter of the baseline on the court and is part of theTennisSense system [2]. Nine different camera po-sitions were tested, however capturing the playersmotions from behind, proved to yield significantlybetter recognition results. The camera is an AXIS215 PTZ camera, which is positioned 2.8 metersabove the court and has a high zoom functional-ity, as well as physical pan and tilt. The high zoomis useful for focusing on the player from behind thecourt baseline.

b) Data Capture

For the initial data capture sessions we consultedwith the tennis coaches on a suitable format forcapturing an adequate representation of tennisserves, forehands and backhands. It was advisedthat a player should spend five minutes warmingup in order to find their true rhythm and thiswould then provide us with accurate data. For thisexperiment, data was captured from three righthanded elite players.

Twenty serves from each player were recordedand each player was located left of center behindthe baseline whilst serving. Each player was thenfed thirty balls which were returned by a forehandor a backhand. These returns were played from avariety of locations on the court to create a realisticscenario. In total we had sixty minutes of datafrom three players, which contained sixty serves,over sixty forehands and forty backhands.

c) Recognition Process

To recognise an input action we built a training setconsisting of twenty serves, ten from each player.

The first test involved the recognition of twenty

input strokes. These input strokes contained amixture of forehands and previously unseen servesfrom three players and the results can be viewedin Table 1, rows 1-20. Serves from two of theseplayers were used to build the training set and thethird player was unclassified, however none of thetraining data was reused as testing data. Inputstwenty to thirty in Table 1 are all backhands whichare compared to a training set of twenty serves.

For each input in Table 1, the shortest distanceto key posture one, two and three are recorded incolumns KP 1, KP 2 and KP 3 respectively. The‘Result‘ column displays what the recognition sys-tems output was and the ‘Stroke Played‘ columndisplays what type of stroke was actually played.

To illustrate the results in Table 1, we will ex-plain how the system obtained the results for input9. After all the frames were processed, the Hu mo-ments of each image in the input stroke were gener-ated. Using the Mahalanobis distance we calculatehow close each frame was to key posture 1. Theclosest distance to key posture 1 was 19.8. Like-wise the closest frames to key posture 2 and 3 werecalculated and the distances 16.1 and 36.1 wererecorded respectively. These three values have amean of 24.0 which is above the serve thresholdof a maximum 12.5. A mean of 24 gives a highconfidence that this input is not a serve and there-fore the system has recognised the input stroke as aforehand. ‘Stroke Played‘ displays what the actualstroke was and in this case it was a forehand.

Rows 20 to 30 are all backhand inputs so if thestroke played is not recognised as a serve then itwill be classified as a backhand. As can be seenfrom inspecting the results in Table 1, a high levelof accuracy has been obtained in identifying tennisserves from forehands or backhands.

VII Conclusions and Future Work

The experimental analysis, although performed ona relatively low number of players, indicates a highlevel of accuracy, as illustrated by Table 1.

However, as a first attempt the system servedwell in bringing to light the issues to be consideredfor future research. At present we cannot classifydifferent types of strokes played. It will be a futurechallenge to recognise a topspin serve from a sliceserve, for example or a backhand from a forehand.Given that low cost cameras are being used here itwill also be interesting to see how effective multiplecameras will be in helping to solve the underlyingresearch challenges poised by identifying differenttypes of tennis strokes.

The key postures were manually identified inthis paper by inspecting the frames and visu-ally identifying similar frames across a multiple ofserves from different players. Whilst the similarityin Hu moments was used to verify the similarity

Stroke Number KP 1 KP 2 KP 3 Result SP1 4.4 3.2 7.7 S S2 6.8 3.4 6.5 S S3 6.6 6.3 4.7 S S4 14.6 15.4 117.3 F F5 10.0 21.9 5011 F F6 42.4 3 179 F F7 7.7 7.5 2.6 S S8 5.9 9.2 6.6 S S9 19.8 16.1 36.1 F F10 3.0 4.5 14.3 S S11 9.9 10.0 12.3 S S12 8.8 5.7 11.6 S S13 19.4 9.6 6.9 S S14 25.0 17.3 3.6 F F15 5.9 7.3 3.8 S S16 2.7 7.7 24.3 S S17 31.3 11.1 22.2 F F18 44.3 7.4 24.0 F F19 16.7 64.4 297 F F20 6.5 2.6 2.7 S S

21 40.9 15.2 602.34 B B22 16.2 16.1 4.6 S B23 39.4 8.9 32.8 B B24 29.6 11.0 117 B B25 16.9 10.9 57.7 B B26 18.0 17.1 63.2 B B27 11.0 6.0 195 B B28 32.7 16.2 173 B B29 38.2 12.5 21.6 B B30 38.2 15.9 28.8 B B

Table 1: Inputs 1-20 are a mixture of forehandsand previously unseen serves and inputs 20-30 arebackhands only. This recognition system identifiesif the stroke played is a serve. KP = Key Posture.SP = Stroke Played

of these key postures, a more advanced approachwould be to build a system that can identify thebest match for key postures and thus remove theneed for a visual inspection. Given the compu-tational overhead involved in executing this taskand the time involved it was out of the scope ofthis paper for now, but it would be a paramountrequirement in the future progress of this auto-mated recognition system.

References

[1] Tennis Ireland. http://www.

tennisireland.ie.

[2] D. Connaghan et. al. A sensing platform forphysiological and contextual feedback to ten-nis athletes. In BSN, pages 224 – 229, 2009.

[3] J.W. Davis and A.F. Bobrick. The representa-tion and recognition of action using temporaltemplates. In CVPR, pages 928 – 934, 1997.

[4] G. Zhu et. al. Player action recognition inbroadcast tennis video with applications tosemantic analysis of sports game. In ACMMultimedia, pages 431 – 440, 2006.

[5] C O Conaire, P. Kelly, D. Connaghan, andN. O’Connor. Tennissense: A platform forextracting semantic information from multi-camera tennis data. In DSP, 2009.

[6] P. Kelly, N. E. O’Connor, and A.F. Smeaton.Robust pedestrian detection and tracking incrowded scenes. Image and Vision ComputingJournal, 27(10):1445 – 1458, 2009.

[7] T. Bloom and A.P. Bradley. Player track-ing and stroke recognition in tennis video. InVSSN, pages 93 – 94, 2006.

[8] J. Gil and R. Kimmel. Efficient dilation, ero-sion, opening, and closing algorithms. In Pat-tern Analysis and Machine Intelligence, pages1606 – 1617, 2002.

[9] M. Hu. Visual pattern recognition by momentinvariants. In IRE Trans. Information The-ory, volume 8, pages 179 – 187, 1962.

[10] S. Xiang, F. Nie, and C. Zhang. Learninga mahalanobis distance metric for data clus-tering and classification. Pattern Recognition,41(12):3600 – 3612, 2008.