5
Automatic Key Frame Extraction From Videos For Efficient Mouse Pain Scoring Marcin Kopaczka 1 , Lisa Ernst 2 , Jakob Heckelmann 1 , Christoph Schorn 1 , Ren´ e Tolba 2 , Dorit Merhof 1 Abstract— Laboratory animals used for experiments need to be monitored closely for signs of pain and disstress. A well-established score is the mouse grimace scale (MGS), a method where defined morphological changes of the rodent’s eyes, ears, nose, whiskers and cheeks are assessed by human experts. While proven to be highly reliable, MGS assessment is a time-consuming task requiring manual processing of videos for key frame extraction and subsequent expert grading. While several tools have been presented to support this task for white laboratory rats, no methods are available for the most widely used mouse strain (C56BL6) which is inherently black. In our work, we present a set of methods to aid the expert in the anno- tation task by automatically processing a video and extracting images of single animals for further assessment. We introduce algorithms for separation of an image potentially containing multiple animals into single subimages displaying exactly one mouse. Additionally, we show how a fully convolutional neural network and a subsequent grading function can be designed in order to select frames that show a profile view of the mouse and therefore allow convenient grading. We evaluate our algorithms and show that the proposed pipeline works reliably and allows fast selection of relevant frames. I. I NTRODUCTION AND PREVIOUS WORK While a large research effort is put into the development of protocols that do not require the use of laboratory animals, experiments on animals are still a crucial part of many pre-clinical trials. Strict international regulations ensure that the trial managers put an emphasis on animal well-being by demanding strict stress monitoring over the duration of each trial. Stress monitoring is standardized using well- defined protocols based on stress research [1]. Following the procedures described in these guidelines is a time- consuming task often including animal handling, additional measurements and close screening of video recordings. To ensure objectiveness, it is recommended that many of the tasks defined by the protocols are carried out by the same expert, a requirement that further increases the complexity of experiment planning. In recent years, stress recognition research has focused on observing facial areas such as eyes in order to develop reliable stress detection protocols. The work has been inspired by research carried out on humans, for which psychologists were able to show that a universal face of pain exists [2] that is controlled by mechanisms that make it cross-cultural and and even remain detectable in patients with dementia [3], indicating of its fundamental nature. Following these findings, morphological changes in the facial 1 Institute of Imaging and Computer Vision, RWTH Aachen University, Templergraben 55, 52074 Aachen, Germany [email protected] 2 Institute for Laboratory Animal Science, RWTH Aachen University, Templergraben 55, 52074 Aachen, Germany Fig. 1. A sample image from our grimace scale assessment recordings. Four mice in acrylic boxes are filmed frontally against a red background. appearance of different animals have been analyzed, resulting in a substantial number of findings. As a result, a number of grimace scales for different animals has been proposed, starting with the mouse grim scale (MGS) for laboratory mice [4] that has been proven to be highly reliable in practical use [5] and followed by similar research published for pain assessment in rats [6] [7] and rabbits [8], but also in larger animals such as horses [9], sheep [10] and pigs [11]. With grimace analysis being an established pain assess- ment method, many trials rely on measuring grim scale parameters reliably. Usually, grim scale assessment is per- formed by trained personnel using still frames extracted from a video stream. This procedure requires a well-defined workflow and setup for high-resolution image acquisition, video processing and image rating. Performing grim scale grading and behavioral analysis entirely manually is a time- comsuming task, therefore a number of automated tools for automated or semi-automated image grading have been pro- posed by both academic and commercial software engineers. In the following, we present a number of algorithms and software packages presented for automated assessment of rodent behavior: The mice profiler [12] is a set of algorithms that allow

Automatic Key Frame Extraction From Videos For Efficient ... · several tools have been presented to support this task for white laboratory rats, no methods are available for the

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Automatic Key Frame Extraction From Videos For Efficient ... · several tools have been presented to support this task for white laboratory rats, no methods are available for the

Automatic Key Frame Extraction From Videos For Efficient MousePain Scoring

Marcin Kopaczka1, Lisa Ernst2, Jakob Heckelmann1, Christoph Schorn1, Rene Tolba2, Dorit Merhof1

Abstract— Laboratory animals used for experiments needto be monitored closely for signs of pain and disstress. Awell-established score is the mouse grimace scale (MGS), amethod where defined morphological changes of the rodent’seyes, ears, nose, whiskers and cheeks are assessed by humanexperts. While proven to be highly reliable, MGS assessment isa time-consuming task requiring manual processing of videosfor key frame extraction and subsequent expert grading. Whileseveral tools have been presented to support this task for whitelaboratory rats, no methods are available for the most widelyused mouse strain (C56BL6) which is inherently black. In ourwork, we present a set of methods to aid the expert in the anno-tation task by automatically processing a video and extractingimages of single animals for further assessment. We introducealgorithms for separation of an image potentially containingmultiple animals into single subimages displaying exactly onemouse. Additionally, we show how a fully convolutional neuralnetwork and a subsequent grading function can be designed inorder to select frames that show a profile view of the mouse andtherefore allow convenient grading. We evaluate our algorithmsand show that the proposed pipeline works reliably and allowsfast selection of relevant frames.

I. INTRODUCTION AND PREVIOUS WORK

While a large research effort is put into the developmentof protocols that do not require the use of laboratory animals,experiments on animals are still a crucial part of manypre-clinical trials. Strict international regulations ensure thatthe trial managers put an emphasis on animal well-beingby demanding strict stress monitoring over the duration ofeach trial. Stress monitoring is standardized using well-defined protocols based on stress research [1]. Followingthe procedures described in these guidelines is a time-consuming task often including animal handling, additionalmeasurements and close screening of video recordings. Toensure objectiveness, it is recommended that many of thetasks defined by the protocols are carried out by the sameexpert, a requirement that further increases the complexityof experiment planning. In recent years, stress recognitionresearch has focused on observing facial areas such as eyes inorder to develop reliable stress detection protocols. The workhas been inspired by research carried out on humans, forwhich psychologists were able to show that a universal faceof pain exists [2] that is controlled by mechanisms that makeit cross-cultural and and even remain detectable in patientswith dementia [3], indicating of its fundamental nature.Following these findings, morphological changes in the facial

1Institute of Imaging and Computer Vision, RWTH AachenUniversity, Templergraben 55, 52074 Aachen, [email protected]

2Institute for Laboratory Animal Science, RWTH Aachen University,Templergraben 55, 52074 Aachen, Germany

Fig. 1. A sample image from our grimace scale assessment recordings.Four mice in acrylic boxes are filmed frontally against a red background.

appearance of different animals have been analyzed, resultingin a substantial number of findings. As a result, a numberof grimace scales for different animals has been proposed,starting with the mouse grim scale (MGS) for laboratorymice [4] that has been proven to be highly reliable inpractical use [5] and followed by similar research publishedfor pain assessment in rats [6] [7] and rabbits [8], but also inlarger animals such as horses [9], sheep [10] and pigs [11].

With grimace analysis being an established pain assess-ment method, many trials rely on measuring grim scaleparameters reliably. Usually, grim scale assessment is per-formed by trained personnel using still frames extractedfrom a video stream. This procedure requires a well-definedworkflow and setup for high-resolution image acquisition,video processing and image rating. Performing grim scalegrading and behavioral analysis entirely manually is a time-comsuming task, therefore a number of automated tools forautomated or semi-automated image grading have been pro-posed by both academic and commercial software engineers.In the following, we present a number of algorithms andsoftware packages presented for automated assessment ofrodent behavior:

The mice profiler [12] is a set of algorithms that allow

Page 2: Automatic Key Frame Extraction From Videos For Efficient ... · several tools have been presented to support this task for white laboratory rats, no methods are available for the

tracking of two mice in an enclosed area. The animals are de-tected and modeled as a set of geometric primitives, allowingtracking and automated behavioral analysis based on bodyposture variance that is detected by analyzing the primitiveconfiguration. Social interactions can be analyzed as well byevaluating the spatial relation and temporal behavior of bothdetected animals. The work by Unger et al. [13] also allowstracking of two mice in a laboratory setup. The methodis based on an active shape model that is extended by amethod for detecting and resolving collisions that is requiredwhen both animals get into close body contact with eachother, a common problem seen in social experiments withrodents. This way, different social interactions are detectable.Additionally, behavioral analysis is performed by evaluatingthe active shape model’s parameters, allowing detection ofsocial and individual behavior such as self-grooming whichcan be an indicator for disorders. A set of widely usedcommercial tools for rodent behavioral analysis is providedby the Noldus company, with the EthoVision system [14]being one of them.

All of the above described methods have been developedfor the automated behavioral analysis of rodents in top-downscenarios such as open field tests, giving researchers a widerange of tools to aid them in conveniently evaluating videosacquired in these scenarios. However, grimace scale scoringrequires a different recording setting. Animals need to berecorded frontally with a high-resolution camera and cagesetup that allows detection and assessment of subtle grimacechanges. Usually, small boxes made of acrylic glass areused as animal containers for the duration of the grimacescale recording. After acquiring the video data, the framesare manually screened and relevant individual frames areexported for assessment. Based on interviews with experts,manual video screening and frame selection are the mosttime-consuming part of the procedure. This bottleneck hasbeen identified and addressed in [7], where a grim scale forrats inspired by the mouse grimace scale has been presentedtogether with an automated method for key frame selection.The selection method uses the Viola-Jones face detectionalgorithm [15] trained on rat image data to detect eyes, earsand the whole head. Frames with good detection results arethen saved for manual assessment, i.e. no automatic painscoring based on the described grimace scale is performed.To the best of our knowledge, the only method presented sofar for automated pain scoring based on grimace features hasbeen presented for sheep in [16], where HOG features fromfacial regions are computed and analyzed using a supportvector machine.

Our experiments in applying the method from the abovepublications to our image data in which black lab mice(C56BL6, a widely used mice strain) has shown that noneof the available algorithms was suitable for face detection orautomated classification. The reason was that the Haar andHOG feature descriptors used in the publications were notrobustly applicable to black mice due to lack of contrast.Therefore, novel methods needed to be developed, which isthe contribution of this work. We introduce a set of methods

that form a pipeline allowing automated processing of mousevideos and automatic extraction of a set of images forefficient grim scale assessment. To the best of our knowledge,we are also the first to present methods for facial imageassessment of black mice.

This paper is structured as follows: Following this in-troduction and overview of available approaches, we de-scribe the imaging setup used to acquire the videos andour image processing pipeline for key frame extraction inSection II. Quantitative and qualitative results are presentedin Section III, followed by a conclusion in Section IV and adiscussion and final remarks in Section V

II. MATERIALS AND METHODS

Here, we first describe the imaging setup designed in orderto acquire high quality recordings while staying compliantwith restrictions implied by animal handling requirements.Afterwards, we give a detailed description of the pipelinedeveloped to process the images and extract key frames.

A. Image Acquisition

They key requirement of the imaging setup was theacquisition of high-resolution frames under lighting restric-tions imposed by animal handling regulations that limitthe maximal light intensity in labs working with mice.Iteratively improving the setup and re-assessing the acquiredimages resulted in the following optimized setup: The miceare placed in small acrylic boxes designed following thedescription given in the mouse grim scale definition. Up tofour boxes are placed in a holding frame to allow efficientparallel recording of multiple animals. The frame is placedinside a white light tent to even out incoming light and toreduce reflections. A camera capable of recording videos ata resolution of 1920 x 1080 at 30 frames per second isplaced outside the tent and zoomed in to record all cagessimultaneously. Finally, a red backdrop is placed behind thecages to increase the contrast in the red color channel sincepreliminary experiments have shown that the fur of the filmedC56BL6 mice shows maximal contrast in the red channel.Note that the backdrop color is perceived as black by themice since the spectral range of mouse eyes does not coverwavelengths interpreted as red by humans and RGB sensors.

B. Image Processing Pipeline

The pipeline consists of several steps which are displayedin Fig. 2. In detail, we perform the following steps:

• Subimage slicing - First, the image needs to be croppedautomatically into smaller pieces with each subimagedisplaying only one box. To this end, a yellow markerthat is placed in the center of the holding frame isdetected by converting the image color space to HSV,thresholding the image in each of the three channelsand combining the results using a pixel-wise ANDoperation. Binary regions are detected in the resultingbitmap and the coordinates of the largest connectedregion’s centroid are used to slice the image verticallyand horizontally into four subimages.

Page 3: Automatic Key Frame Extraction From Videos For Efficient ... · several tools have been presented to support this task for white laboratory rats, no methods are available for the

Fig. 2. Our image processing pipeline. A detailed description of each stepis given in Section II-B.

• Box detection - In several recordings, not all slotsof the holding frame contain a box with an animalin it. To increase the performance, image series thatcontain no animal are detected and excluded from fur-ther processing. Box detection is performed by applyingcanny edge detection [17] to the red channel of eachsubimage, followed by template matching with a boxtemplate (Fig.3). Subsequently, the maximal correlationcoefficient is computed for each image. Images with acoefficient lower than an experimentally defined value(see Section III for the precise experiment evaluation)are dropped as they contain no animal.

Fig. 3. Box detection using edge detection and template matching. Top left:red channel of the original image. Top right: Canny detector output. Bottomleft: Box template for correlation computation. Bottom right: Bounding boxcentered at the coordinates of the correlation maximum.

• Eye detection - As assessment of eyes (orbital tight-

ening) is a key component of grim scale scoring, thescore for image grading is based on eye visibility. Tolocalize the eyes in the images, we designed and traineda fully convolutional neural network (FCN) for semanticsegmentation. Network design was optimized iterativelywith both localization performance and computationalcomplexity in mind. The final architecture is shown intable I. A sample output of the net is shown in Fig. 4.To improve detection performance, manual annotationswere given an additional “don’t care” label around themarked eye positions to improve inter-class differencebetween eyes and background or fur texture.

Layer TypeInput: (X, Y, 3)

Conv2D 64, 3x3 KernelConv2D 32, 3x3 Kernel

MaxPooling 2 x 2Dropout 1/4

Conv2D 64, 3x3 KernelConv2D 64, 3x3 Kernel

MaxPooling 2 x 2Dropout 1/4

UpSampling2D (4, 4)Conv2D 32, 3x3 Kernel

Dropout 1/4Conv2D 2, 3x3

Output: Softmax2DTABLE I

NETWORK ARCHITECTURE

Fig. 4. FCN performance. Left: original input image. Center: manuallydefined ground truth segmentation (black - background, white - eye, gray -ignore). Right: FCN output.

• Key frame extraction - in a final step, images thatare well suitable for grim scale-based pain assessmentare extracted. In order to define a quantitative measurefor image grading, veterinary experts with experiencein grim scale assessment were interviewed. The expertsagreed that in order to be well suited for assessment,the image should show a side view of the animal asthis perspective allows optimal grading based on all ofthe grimace scale’s sub-scores. Therefore, our proposedframe extraction algorithm rejects all images with morethan one detected eye as these images show frontalviews of the mouse. Subsequently, a score defined as thenumber of all pixels for which the FCNs final softmaxlayer returned 1 is computed for all non-rejected images.To ensure that the frames are not sampled from the samesub-sequence in the video, non-maximum suppressionis applied to the computed scores. Finally, the output

Page 4: Automatic Key Frame Extraction From Videos For Efficient ... · several tools have been presented to support this task for white laboratory rats, no methods are available for the

images are sorted in descending order according to theirscore, allowing to draw an arbitrary number N of thebest frames in the video.

III. EXPERIMENTS AND RESULTS

To evaluate our algorithms, a total of 202 images has beenexported from video files and mouse eyes have been manu-ally annotated in all images. Subsequently, all algorithms ofour pipeline were evaluated using this image data.

A. Subimage slicing

The slicing procedure succeeded on all images of thedataset, returning 808 valid subimages.

B. Box detection

The cross-correlation score was computed for all subim-ages. As shown in Fig. 5, two distributions can be seenin the resulting scores. The small distribution with lowerscores corresponds exactly to the images containing no box,while all images containing a box are in the higher-rangingdistribution. Using a threshold of 7.5 allows to differentiatebetween both groups with no misclassifications.

Fig. 5. Cross-correlation coefficients of the subimages. Blue - imagescontaining a box with a mouse in them. Green - images with no box.

C. FCN performance

To assess FCN performance, the network has been trainedon the 808 generated subimages. For evaluation, eye posi-tions have been marked in 240 additional subimages. Thelocalization succeeded in all cases with a maximal distanceof 5 pixels between the centers of the automatically localizedand manually annotated eye positions. The average distancebetween both eye positions was 2.21 pixels. A histogram ofthe results is shown in Fig. 6.

D. Key Frame Extraction

The extracted key frames were assessed qualitatively byexperienced mouse grim scale annotators. The experts agreedthat the extracted frames are well usable for grim scaleassessment.

Fig. 6. Histogram of the distances between manually annotated andautomaticalyl detected eye centers.

E. Benchmarking

A performance measurement of our pipeline on an Inteli5-760 with 2.80 GHz and a GeForce 980Ti graphics cardhas shown that our algorithm can process a full frame(four subimages) in about 1.3 seconds. When one frameper second is exported for automated assessment, then thepipeline can evaluate a video automatically close to real time,thereby nullifying the time that currently needs to be investedinto manual frame selection and drastically decreasing theamount of time between video acquisition and availabilityof frames for scoring, thereby allowing faster reactions toscore changes visible in the videos.

IV. CONCLUSION

Our evaluation has shown that our proposed pipelineallows a robust and efficient selection of key frames for grimscale assessment. Our preprocessing methods for subimagegeneration and box detection allow a reliable generation ofsingle-mouse images for assessment, while score computa-tion based on FCN-driven eye detection allows to pick theframes that are best suited for assessment. Qualitative anal-ysis of the final images performed by experienced manualannotators reports good expert acceptance of our selectedimages.

V. DISCUSSION

In our work, we presented a set of novel methods forthe automated selection of key frames for efficient grimscale rating of the widely used C56BL6 mouse strain. Imagepreprocessing for subimage generation and box detection wasperformed to generate images containing single animals only,while a fully convolutional neural network was designedfor eye detection. A score based on the FCN output wasintroduced to automatically select frames that are well us-able for annotation. The proposed algorithms work reliablyon our presented dataset and greatly help in reducing thetime needed for selection of frames for mouse grim scaleassessement.

Page 5: Automatic Key Frame Extraction From Videos For Efficient ... · several tools have been presented to support this task for white laboratory rats, no methods are available for the

REFERENCES

[1] T. Burkholder, C. Foltz, E. Karlsson, C. G. Linton, and J. M. Smith,“Health evaluation of experimental laboratory mice,” Current protocolsin mouse biology, pp. 145–165, 2012.

[2] K. M. Prkachin, “Assessing pain by facial expression: facial expressionas nexus,” Pain Research and Management, vol. 14, no. 1, pp. 53–58,2009.

[3] M. Kunz, S. Scharmann, U. Hemmeter, K. Schepelmann, and S. Laut-enbacher, “The facial expression of pain in patients with dementia,”PAIN R©, vol. 133, no. 1, pp. 221–228, 2007.

[4] D. J. Langford, A. L. Bailey, M. L. Chanda, S. E. Clarke, T. E.Drummond, S. Echols, S. Glick, J. Ingrao, T. Klassen-Ross, M. L.LaCroix-Fralish et al., “Coding of facial expressions of pain in thelaboratory mouse,” Nature methods, vol. 7, no. 6, pp. 447–449, 2010.

[5] A. L. Miller and M. C. Leach, “The mouse grimace scale: a clinicallyuseful tool?” PLoS One, vol. 10, no. 9, p. e0136000, 2015.

[6] A. L. Whittaker and G. S. Howarth, “Use of spontaneous behaviourmeasures to assess pain in laboratory rats and mice: How are weprogressing?” Applied Animal Behaviour Science, vol. 151, pp. 1–12,2014.

[7] S. G. Sotocinal, R. E. Sorge, A. Zaloum, A. H. Tuttle, L. J. Martin, J. S.Wieskopf, J. C. Mapplebeck, P. Wei, S. Zhan, S. Zhang et al., “Therat grimace scale: a partially automated method for quantifying painin the laboratory rat via facial expressions,” Molecular pain, vol. 7,no. 1, p. 55, 2011.

[8] S. C. Keating, A. A. Thomas, P. A. Flecknell, and M. C. Leach,“Evaluation of EMLA cream for preventing pain during tattooing ofrabbits: changes in physiological, behavioural and facial expressionresponses,” PloS one, vol. 7, no. 9, p. e44437, 2012.

[9] E. Dalla Costa, M. Minero, D. Lebelt, D. Stucke, E. Canali, andM. C. Leach, “Development of the horse grimace scale (hgs) as apain assessment tool in horses undergoing routine castration,” PLoSone, vol. 9, no. 3, p. e92281, 2014.

[10] K. M. McLennan, C. J. Rebelo, M. J. Corke, M. A. Holmes, M. C.Leach, and F. Constantino-Casas, “Development of a facial expressionscale using footrot and mastitis as models of pain in sheep,” AppliedAnimal Behaviour Science, vol. 176, pp. 19–26, 2016.

[11] P. Di Giminiani, V. L. Brierley, A. Scollo, F. Gottardo, E. M.Malcolm, S. A. Edwards, and M. C. Leach, “The assessment of facialexpressions in piglets undergoing tail docking and castration: towardthe development of the piglet grimace scale,” Frontiers in veterinaryscience, vol. 3, 2016.

[12] F. De Chaumont, R. D.-S. Coura, P. Serreau, A. Cressant, J. Chabout,S. Granon, and J.-C. Olivo-Marin, “Computerized video analysis ofsocial interactions in mice,” Nature methods, vol. 9, no. 4, pp. 410–417, 2012.

[13] J. Unger, M. Mansour, M. Kopaczka, N. Gronloh, M. Spehr, andD. Merhof, “An unsupervised learning approach for tracking mice inan enclosed area,” BMC Bioinformatics, vol. 18, no. 272, pp. 1–14,2017.

[14] A. Spink, R. Tegelenbosch, M. Buma, and L. Noldus, “The ethovisionvideo tracking systema tool for behavioral phenotyping of transgenicmice,” Physiology & behavior, vol. 73, no. 5, pp. 731–744, 2001.

[15] P. Viola and M. J. Jones, “Robust real-time face detection,” Interna-tional journal of computer vision, vol. 57, no. 2, pp. 137–154, 2004.

[16] Y. Lu, M. Mahmoud, and P. Robinson, “Estimating sheep pain levelusing facial action unit detection,” in 2017 12th IEEE InternationalConference on Automatic Face Gesture Recognition (FG 2017), May2017, pp. 394–399.

[17] C. Harris and M. Stephens, “A combined corner and edge detector.”in Alvey vision conference, vol. 15, no. 50. Manchester, UK, 1988,pp. 10–5244.