Video Retrieval System for Meniscal Surgery to …downloads.hindawi.com/journals/js/2018/4390703.pdfResearch Article Video Retrieval System for Meniscal Surgery to Improve Health Care

Research ArticleVideo Retrieval System for Meniscal Surgery to Improve HealthCare Services

Sana Amanat,1 Muhammad Idrees,2 Muhammad Usman Ghani Khan,1 Zahoor Rehman ,3

Hangbae Chang,4 Irfan Mehmood ,5 and Sung Wook Baik5

1Department of Computer Science and Engineering, University of Engineering and Technology Lahore, Lahore, Pakistan2Department of Computer Science and Engineering, University of Engineering and Technology Lahore Narowal Campus,Lahore, Pakistan3Department of Computer Science, Comsats Institute of Information Technology Attock Campus, Attock, Pakistan4Department of Industrial Security, College of Business and Economics, Chung-Ang University, Seoul, Republic of Korea5Department of Software, Sejong University, Seoul, Republic of Korea

Correspondence should be addressed to Irfan Mehmood; [email protected]

Received 18 January 2018; Accepted 29 April 2018; Published 14 June 2018

Academic Editor: Ka L. Man

Copyright © 2018 Sana Amanat et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Meniscal surgery is considered the most general orthopedic process that deals with the treatment of meniscus tears for humanhealth care. It leads to a communal contusion to the cartilage that stabilizes and cushions the knee joints of human beings. Suchtears can be classified into different categories based on age group, region, and occupation. Further, a large number ofsportsmen and heavy weightlifters even in developed countries are affected by meniscus injuries. These patients are subjected toarthroscopic surgery, and during surgical treatment, the perseverance of meniscus is a very crucial task. Current researchprovides a significant ratio of meniscal tear patients around the globe, the critical expanse is considered as having strikinglyrisen with a mean annual of 0.066% due to surgery failure. To decumbent this ratio, an innovative training mechanism isproposed through video retrieval system in this research. This research work is focussed on developing a corpus and videoretrieval system for meniscus surgery. Using the proposed system, surgeons can access guidance by watching the videos ofsurgeries performed by an expert and their seniors. The proposed system is comprised of four approaches to the spatiotemporalmethodology to improve health care services. It entails key point, statistical modeling, PCA-scale invariant feature transform(SIFT), and PCA-Gaussian mixture model (GMM) with a combination of sparse-optical flow. The real meniscal surgery datasetis used for testing purposes and evaluation. The results conclude that using PCA-SIFT approach improves the results with anaverage precision of 0.78.

1. Introduction

Ameniscus tear is a public discolouration of the cartilage thatalleviates and pads the knee joints of human beings. This iscommon malfunctioning across the globe, with a meanannual rate of 0.066%. It is noteworthy to mention that alarge number of meniscus injury incidences are observed atabout 66 per 100,000 around the globe. These incidencescan be categorized under various aspects, that is, age group,region, and occupation. Most of the sportsmen and heavyweightlifters affected with age less than 40 years in developedcountries recovered using meniscus surgery. They cover the

areas of America, Latin, Asia, and Africa. On the other handin Denmark, more degenerative changes in the annual inci-dence are observed. These changes are increasing from 164to 312 over the past 10 years and about 2 million casesobserved in meniscus surgery treatment. Details of regis-tered patients, diagnosis, and procedures maintained byDanish National Patient Register can be found in [1] andSummit (2016).

Previous studies have reported a very high rate of repeti-tive surgery due to the failure of meniscus surgery. Differentcauses are observed for the failure of meniscus surgery. Oneof the major causes is the inappropriate removal of meniscus

HindawiJournal of SensorsVolume 2018, Article ID 4390703, 10 pageshttps://doi.org/10.1155/2018/4390703

http://orcid.org/0000-0001-9968-0330

http://orcid.org/0000-0001-7864-957X

https://doi.org/10.1155/2018/4390703

because unexperienced surgeons do not know how much orwhich part has to be removed properly. More precisely, itdeals with the interaction of tissues and radiation, whichmay be represented in the form of a static image and videoor recordings. These recordings can be used to train surgeonsand practitioners to perform successful surgeries. One way toaccess these recordings is through video retrieval system thataims to retrieve videos from a source of the multimediarepository. Upon surgeon’s request and need, video retrievalsystem can provide the most relevant and accurate videos ofmeniscal surgery. Likewise, there have been several studies inrelation to surgery, that is, for the eyes, lungs, heart, andkidney, but meniscal surgery holds a unique position sinceit is a commonality in most of the countries and in literatureneither corpus nor video retrieval systems precisely formeniscal surgery yet found [2].

To retrieve retinal surgery videos using compressed videostreams, the required motion info is extracted at regularintervals. Precisely, two types of motion information areexcerpted: one is region trajectory and the other encompassesthe residual error. A heterogeneous feature vector is used tocombine the motion information, and sequence segmenta-tion is utilized to extract the motion information. TheKalman filter and generalized Gaussian distribution are uti-lized to formulate region trajectory and residual information.In last step, extension of fast dynamic time warping is alsoused along with feature vector to compare videos [3].

Another approach to retrieve medical videos is a keyframe which is initially extracted and its global features areexcerpted, which has to be saved in XML script format inthe form of an object to reduce the storage space. Further-more, neural network and naïve Bayes are used to enhancethe performance. Features of an image are extracted usingentropy, Huffman coding, and fast Fourier transforms [4].

In order to retrieve laparoscopy videos from a colossalrepository of videos, a feature signature is utilized which iscomposed of abstract and local details of the image. Initially,position, colour, and texture features of an image are consid-ered for this signature. Weight is assigned to each featuredescribing the potential to be used as a feature signature rep-resentative. For comparison between two signatures, earthmover’s distance, signature quadratic form distance, and sig-nature matching distance are used in combination [5]. Toretrieve endoscopy videos from repository, feature signatureis computed based on global features, that is, colour, position,and texture. Weight is assigned to each feature, and zero-weighted features are considered as irrelevant for signature.Those features have weighted value unequal to zero denotedas signature representatives. Furthermore, cluster centroid ofthese features is used as feature representative. To measurethe similarity between endoscopic images, signature match-ing distance and adaptive binning feature signature are usedby using the visual characteristics [6].

In another approach to retrieve endoscopic videos, local,global, and feature fusion techniques are employed. Toextract features, various algorithms CEDD, SURF, andSIMPLE are used. Additionally, early and late fusionapproaches are also considered. These approaches are testedon 1276 video clips. LIRE software library is used to index

test frames also capable to extract 20 visual features of animage [7].

To retrieve cataract surgery videos from the repository, avideo is divided into a number of overlapping clips. Each clipcontains the same number of frames. Various features areextracted from these frames using histogram of oriented gra-dients (HOG), BoVW, and motion histogram. These featuresare statistically modeled using the Bayesian network, CRFs,and HMMs. Thirty cataract surgery videos are used for thetesting purpose having 720× 576 pixels. A mean area underROC of 0.61 was achieved in recognizing surgical steps fromvideo stream [8]. In order to retrieve and analyse femalereproductive system through hysteroscopy, a video summari-zation framework is proposed. To compute feature set, col-our information is extracted from key frames using COCcolour model. Other salient features are also extracted suchas texture, motion, curvature, and contrast [9].

Another approach to retrieve cataract videos from thesource that formulate a feature set is by using global featuressuch as colour, texture, and motion information of the frame.Furthermore, the wavelet transform is performed on theseextracted channels of colour. In this study, cataract and epir-etinal surgeries are considered [10]. To retrieve similar videosin the domain of retinal surgery, a motion-based approachhas been found in literature which utilized MPEG-4 toextract features from the frame. For video encoding, a groupof pictures and macroblocks are used where three types offrames are considered. Motion histogram, classification,and motion trajectories are used to formulate motion featureset. Furthermore, signatures are also composed and tomeasure their ratio of similarity, fast dynamic time warpingapproach is used. This approach is applied to epiretinalmembrane surgery.

In order to control the failure rate of surgery and toovercome the mentioned drawbacks, there is a dire needfor a sophisticated approach. It could be possible via thefabrication of a connexion between information technol-ogy and medical sciences. Information technology is capa-ble of providing medical diagnosis, clinical analysis, andtreatment planning through a practice known as medicalimaging [11, 2, 12].

Our proposed system firstly focuses on spatial features,that is, the location of objects in reference to static objects[13]. Secondly, it deals with the temporal information of animage, which provides a glimpse of moving and operationalobjects [14]. This system is comprised mainly of fourmethods, that is, interest point, statistical modeling, PCA-SIFT, and PCA-GMM-based video retrieval. The first methodincorporates interest points of videos which are excerptedusing up-to-the-minute feature explication and indicatoralgorithms known as SIFT, HOG, and covariant featuredetector (CFD) with distance transform (DT) addressed byBai et al. [15] and Bharathi et al. [16]. The second methodencompasses statistical modeling of features using GMMand Kullback-Leibler divergence. Moreover, in the third andfourth method, dimension reduction technique is integratedwith SIFT and GMM, respectively. To extract the temporalfeatures, Lucas-Kanade sparse-optical flow is incorporatedin each method [17]. This proposed system can help new

2 Journal of Sensors

and less-experienced surgeons to get appropriate informationabout surgical methods, procedures, and tools to perform asuccessful surgery [11, 2, 12]. These meniscal recordings canadditionally be used for the explanation of the patients andtheir follow-up sessions [9].

Table 1 illustrates a summarized comparison of previousapproaches to retrieve videos. It is clear from this comparisonthat there are so many systems for medical video retrieval,but a video retrieval system for meniscal surgery practi-tioners is not found in spite of its need around the globe. Inthis system, we also have used a hybrid of state-of-the-arttechniques for feature detection, recognition, and compari-son. The most important point that can be concluded in thissummarization is a step towards new research direction.

2. Materials and Methods

This section sheds light on the proposed methodology formeniscal surgery video retrieval. The surgical video is consid-ered to be composed of frames which are chronologicallybound together to define a tale, that is, live surgical process.Each frame grasps an instant of surgery, producing a coher-ent association with contiguous frames. Algorithms in thisframework are applied to the frame instead of a video becausethe frame is a building block of a video. Therefore, thesevideos are decomposed into frames and shot boundary isdetected using XOR. As a result, the collections of masterframes depicting master shot are extracted. These masterframes are further utilized to model spatiotemporal featuresof a particular video. Detail of spatiotemporal feature extrac-tion and modeling technique is as follows.

2.1. Spatiotemporal Modeling Using Interest Points. Theinterest point represents the identification of interest or keypoints within an image used for further processing. Adescription of the interest point is as follows:

(i) It has a demarcated location and mathematicaldescription of an image.

(ii) Local image organization is high in the context oflocal information content.

(iii) In the image domain, it has steady trepidation ofboth local and global.

2.2. SIFT and CFD-Based Approach. SIFT is used to detectinterest points using various scales. With the help of Gauss-ian filters, the change in succeeding Gaussian-blurred imagesis figured out to excerpt interest points [17]. A difference ofGaussian (DoG) image DoG a, b, Δ of an original imageIm a, b is given by

DoG a, b, Δ = L a, b, LiΔ − L a, b, LjΔ , 1

where L a, b, LΔ describes the convolution operation of theoriginal image Im a, b with Gaussian blur G a, b, LΔ , atscale LΔ. [19].

CFD uses an affine adaptation to approximate the figureof an affine region in order to compute covariant features

of an image. With the help of affine adaptation, interestpoints can be calculated as follows [10]:

Cfd U =Deaa U Deab U

Deab U Debb U, 2

where Deaa U is defined as the 2nd fractional derivative inthe direction a and Deab U represents the mix fractional2nd derivative in directions of a and b. These computed fea-tures of input and target images are matched using Euclideandistance to get the value of similarity [20]. These calculateddistances are sorted to make the ranked list based on similar-ity of input video with target videos.

2.3. HOG and Distance Transformation (DT). The HOGfeature detector uses edge directions to compute local objectappearance and shape within an object.

Each image is split into small cells, and histogram ofgradient directions is computed for all cells.

In order to improve accuracy, contrast normalization oflocal histograms is computed by measuring the intensityacross image block. Dalal-Triggs method can be used fornormalization such as [3].

f = N

N 22d

23

All histograms in a given block are represented by avector N, which is not normalized, a small constant value isrepresented by d and N k be its k-norm for k = 1, 2 [21].

DT [3] descriptor is also called distance map which labelsthe pixels by measuring distance with the support of Canny’sedge detector. A framework of key point base is present inFigure 1.

2.4. Statistics-Based Framework for Video Retrieval. Statisticsbased technique is an approach to scrutinize condense and tointerpret data. For this purpose, we have used GMM withKullback-Leibler divergence to model the spatiotemporalcontent of videos. Following is the description of a method.

2.5. Spatiotemporal Modeling through GMM. The surgicalfootage is an object which contains highly complex multidi-mensional contents based on time series folded data. Assumethat a video V is an arrangement of frames and has length l,v1,… , vn , with every frame vi, providing a sequential

pattern of an arrangement.A video frame is described by each element vi, and its

spatial features are modeled in d-dimensional space. At thisphase, is presents the spatial features of frame xi in d dimen-sion feature space. Spatial features of a frame constitute factsof locality with the magnitude of features, for example, col-ours of every pixel (red, green, and blue), and collectedobjects may be easily traced. In order to model RGB colourdistribution, pixel position can be stated as shown inFigure 2.

A spatiotemporal probabilistic model can be constructedthrough statistical observation of samples by using time andspace features. The sampling of the video is performed at a

3Journal of Sensors

Table1:Com

parisonof

prop

osed

metho

dology

withexisting

ones.

Reference

papers

Featureset

Featuredescriptor

Similarity/cluster/

classification

measurement

Dataset

Research

domain

Video

type

Drouecheetal.[18]

Amotionvector,residual

inform

ation

Kalman

filter,generalized

Gaussiandistribu

tion

Extension

offastdynamic

timewarping

(EFD

TW)

69video-recorded

retinalsurgery

steps,1400

video-recorded

cataractsurgery,and1707

movie

clipswithclassified

human

action

s

Eye

surgery

MPEG-4

AVC/H

.264

Ram

yaetal.[4]

Globalfeatures

Entropy,h

istogram

,frequency

Neuraln

etwork

15478XMLscriptsbasedon

key

fram

esMedicalvideos

mp4

Scho

effmannetal.[5]

Position,

colour,texture

N/A

Earth

mover’sdistance,

signaturequ

adraticform

distance,and

signature

matchingdistance

25stored

videofilesper

procedure

Laparoscop

icsurgery

N/A

Beecksetal.[6]

Spatialinformation(colou

r,coarseness,con

trastvalue)

Localfeature

descriptor

k-meancluster

25segm

entvideos

perprocedure,

resulting

intotal1276shortvideo

clipsrangingfrom

60secto

250sec

End

oscopic

videos

N/A

Carlosetal.[7]

Localand

globalfeatures,

featurefusion

SIMPLE

,CEDD,SURF

k-mean

1276

End

oscopy

videos

N/A

Charriere

etal.[8]

N/A

BoV

Wk-nearestneighb

our

(KNN),motionhistogram,

andhidd

enMarkovmod

el30

Cataractsurgery

DVform

at

Muh

ammad

etal.[9]

Colou

rop

ponent

colour

space(CoC

),curvature

map,texture

N/A

N/A

10Hysteroscop

yN/A

Quellecetal.[10]

Texture,colou

r,op

ticalflow

N/A

N/A

23retinal,100cataractsurgery,

69ho

llywoodmovie

Eye

surgery

MPEG2

Drouecheetal.[18]

Motionhistogram

Fastdynamictimewarping

k-mean,

Kalman

filter,and

extend

edof

fastdynamic

timewarping

23epiretinalsurgeryvideo

Epiretinal

mem

brane

surgery

MPEG-4

AVC/H

.264

Propo

sed

The

keypo

int,colour

SIFT

,HOG,D

T,C

FD,

PCA-SIFT,spatiotem

poral

featurevector

GMM,E

uclid

eandistance,

andKullback-Leibler

divergence

20meniscussurgery

Meniscus

surgery

Flv,Avi,m

p4


presumed rate, preserving temporal enslavement by theiradjoining images. Spatial features of every image depend onthe spatial context of every image and may depend on its for-mer image, which is known as spatiotemporal association.

Furthermore, GMM [22] is applied to an approximatelikelihood density of collected items. GMM is an unsuper-vised learning approach in which probabilistic model is builtwith the assumption that data points are produced by anassortment of a finite amount of Gaussian distributionsthrough unidentified parameters.

Moreover, a d-dimensional spatial element for theimage xi may be demonstrated through GMM along withK quantities and a presumed number of a frame structureis represented by K , for instance, the numeral core objectsinstituted in particular section. By using MLE, an associationcan be computed through the probability of spatiotemporalcontinuousness amongst two successive images as presentedin Figure 3.

We have acquired Gaussians against input and outputvideos after applying maximum likelihood as shown inFigures 4 and 5. By analyzing data distribution of theseGaussians, we can determine that dissimilarity is higherbetween input and output. To measure the dissimilarityratio, k-Lealer divergence, also known as relative entropy,was used.

2.6. Compression-Based Video Retrieval. In order to reduceinsignificance and redundancy, datum compression tech-nique can be used. The intention of compression is to cutdown the data dimensions for effective storage, retrieval,and manipulation. For this purpose, principal componentsanalysis (PCA) is used to the significant components fromdata that immaculately defines the entire image. Moreover,a hybrid approach is applied by the combination of PCAalong with SOFT and GMM to improve the precision ofretrieval method. Details of the method are as follows.

2.6.1. Summary of PCA. Principal component analysis is thedimensional reduction method which calculates linear pro-jections to maintain maximum variance in original data.

Query image

SIFT + covariant feature detector

HOG + distance transform

Distance transform

Feature descriptors

Input imageFeature matching

Resultant videos

Figure 1: Framework of key-point-based retrieval.

Figure 2: Temporal connection is created to form continuitybetween the consecutive frames.

S1 Si Sn

�휃1 �휃i ⋯ �휃n

f (Si | �휃 Sj)

Spatialtemporal context(location, magnitude, and time)

Spatiotemporalcorrelation

⋯

Figure 3: Spatiotemporal probabilistic model for a video framesequence x1,… , xn .

0.06

0.04

0.02

0

0.1

0.05

00 50 100 150 200 0 2 4 6 ×105

×10−6

0 100 200 300 0 100 200 300

0.06

0.04

0.02

0

4

3

2

1

0

Figure 4: GMM of an input query.

5Journal of Sensors

Eigenvectors to Eigenvalues λ of a matrix of the covarianceEx of datum D ∈ RdxM where d defines the dimensions ofthe original data and M shows the total count of points.

Exa = λa 4

By visualizing the point xi via projecting

yi = Axis 5

A defines a matrix having eigenvectors associated with 2or 3 of the biggest eigenvalues, and Yi is a calculated lowerdimension sketch of Xi. PCA is the most commonly usedmethod to visualize new data. Images produced by PCA arevery easy to deduce and also works moderately well evenwhen the difference in data stands primarily intense insolitary limited directions [23].

2.6.2. PCA-SIFT-Based Video Retrieval. In this method, PCAis applied on SIFT which is highly vigorous to image distor-tion. In PCA-SIFT firstly computes an Eigenspace to definean inline image of the native patch. In the second step, thelocal image gradient is calculated on a given patch. In thethird step, it projects gradient image vector by the Eigenspaceto originate the dense feature trajectory. That feature trajec-tory is expressively negligible than SIFT feature trajectoryandmay be used with the identical matching method. Euclid-ean distances amongst binary feature trajectories can be usedto define either these two trajectories resemble identicalinterest point in dissimilar frames. The following are thesteps to retrieve videos using PCA-SIFT approach.

(1) Input shot is transformed to image.

(2) The master image is extracted using shot boundarydetection.

(3) For each mater frame, PCA-SIFT is calculated.

(4) Steps from 1–3 are repeated for target videos from1–N .

(5) After step 4, two feature vectors were gotten, that is,for input and target videos [1 –N].

(6) The feature vector of input and target videos iscompared using Euclidean distance to computethe similarity.

2.6.3. PCA-GMM-Based Video Retrieval. The analyses ofFigures 4 and 5 show that spatial context is comprised oflocation and magnitude features against each pixel such asRGB colour channel. These colour channels are extractedfrom input and target videos and passed through dimensionreduction process using PCA. These compressed channelsare modeled using GMM for spatiotemporal analysis. Fur-thermore, Kullback-Leibler divergence is applied in similarway as presented in Section 3.2.

2.6.4. Temporal Points Extraction from Surgical Videos bySparse-Optical Flow. Surgical video is a dynamic entity inwhich the motion of an object plays a vital role in under-standing an ongoing surgical process. To capture the move-ment or motion of surgical procedure, we have used opticflow method, which is an arrangement of moving object inthe photographic sight prompted through a relative move-ment amongst a scene plus an eye, that is, camera. TheLucas-Kanade method was used to capture the motion ofobjects within a video of live surgery [17].

3. Results and Discussion

In this section, we present experimental settings includingthe data set and evaluation method.

3.1. Dataset. Datasets are comprised of 20 real meniscus sur-gical videos collected from various online publicly availablerepositories MedlinePlus [24]; AAOS [25]; vjortho (2017);and Northgate (2017). These shots are sufficient and benefi-cial in the current scenario because it covers a wide rangeof multiple scenarios and uniquely identifies a number ofinstances. It also provides a higher degree of relevance withreference to its surrounding and properly differentiates fromthe nonadjacent objects. Furthermore, it is clear from Table 1that for testing purposes, the state-of-the-art systems haveconsidered a small testing dataset. However, for result evalu-ation of a proposed system, we have requested three surgeonsto choose the ground truth from our dataset. A ground truthis selected in terms of video, a relevant approach, and amechanism can also be found in [9]. Sample frames frommeniscal surgery are shown in Figures 6–8.

3.2. Tool. To precede the evaluation phase, an input video isprovided to our proposed system. Its spatiotemporal featuresare computed against 1–N targeted surgical videos, and themost relevant videos are displayed in the right side of theinterface as shown in Figure 9.

3.3. Evaluation Measurement. For the purposes of a systemevaluation, there are many parameters such as precision,

×10−6

×105

0.06

0.04

0.02

0−50 0 50 100 150 0 50 100 150

0 50 100 150

0.06

0.04

0.02

0

0.1

0.05

00 2 4 6

0

1

2

3

4×10−6

50 0 50 100 150 0 50 100 15

0.06

0.04

0.02

0

0

1

2

3

4

Figure 5: GMM of the output query.


recall, and ROC area. But in the information retrieval system,precision metric is used because it provides the proportion ofretrieved videos that are actually relevant. In order to mea-sure the precision of our system, we have split the video intosmall segments and these segments are further comparedwith the ground truth. Precision can be computed using thefollowing metric [26, 27].

Precision =TPTP

+ FP 6

Herein, a video is said to be the true positive (TP) if it isselected by both our system and the surgeon, or said to befalse positive (FP) if it is selected by our system but not bythe surgeon. Using this scheme, we measure the average pre-cision of our system using our four proposed methods asshown in Table 2. It has already been mentioned that videoretrieval system specifically for meniscus surgery was notyet found in the literature. Therefore, we have considered anumber of general parameters for video retrieval system forresult comparison. A brief comparison of medical videoretrieval system and our proposed system is shown inTable 2.

We can deduce from Table 2 that PCA-SIFT from ourproposed method performs well. Observable key factors ofthis dominant performance are that the incline areas nearbySIFT points are extremely organized to be presented throughPCA. Eigenvalues against the patch decline are considerablyfaster than Edigenvalues for arbitrarily designated inclinepatches. For the reason that patches adjoining points entirelyshare definite characteristics, reducing since SIFT point rec-ognition stage; they are totally adjusted, interchanged to aligncentral inclines with vertical, and are scaled correctly. Thatstreamlines the trade PCA necessity do and enabling thePCA-SIFT to exactly characterize patch by the small numberof dimensions along with Lucas-Kanade optical flowmethod.

4. Conclusion

Our proposed video retrieval system for meniscal surgerymodeling is based on four methods. These methods includekey point-based retrieval, statistical retrieval, PCA-SIFT-based retrieval, and PCA-GMM-based retrieval. They furtherexploit spatiotemporal features for the improvement ofhealth services to human beings.

In a key point-based retrieval method, key points of videoare extracted using SIFT,HOG,CFD, andDT. Secondly, usingthe statistical retrievalmethod, spatial and temporal attributesare statistically modeled using GMM and Kullback-Leiblerdivergence. Moreover, by using PCA-SIFT-based retrievaland PCA-GMM-based retrieval methods, the dimensionreduction technique is used with SOFT and GMM, respec-tively. To track spatiodevelopment over temporal interims,Lucas-Kanade sparse-optical flow method was used, whichis efficient and robust to noise.

Figure 7: Frame extracted from reconstruction surgery ofmeniscal (2).


Figure 9: Sample screenshot of relevant videos using HOGand CFD.


7Journal of Sensors

Table2:Com

parative

analysisof

prop

osed

system

withalreadyexisting

system

s.

Dataset

Clip

size

Average

retrievaltim

eAverage

numberof

match

fram

es/

videos

Recall

Average

precision

ROC

Description

1Medicalvideos

&classified

human

action

Avg

20sec

7min

3sec

00%

0.43

0Tim

erequ

ired

tocompu

tefeaturevector

for9videos

0.43

incase

ofERM

andCDdataset

2Medicalvideos

1.97

min

andabove

16.4(retrievaltimeforall

attributes)

111

0%0.983

0

Precisionisbasedon

8videos

usingthe

neuralnetworkin

term

sof

relevant

key

fram

eretrievaland

atotaln

umberof

key

fram

esforthat

video

3Laparoscop

icvideos

00

80%

00

N/A

4End

oscopicvideos

60secto

250sec

00

88%

00

N/A

5End

oscopy

00

00%

0.78

0N/A

6Cataract

00

00%

0.7

Withmotionanalysisand0.9withthe

presence

ofsurgicalprocess

7Hysteroscop

y2to

3min

00

0%0.92

0

8Eye

surgery

00

00%

0.72

0In

thecase

ofCSD

dataset

9Epiretinalsurgery

00

00%

0.62

00.69

incase

ofmotionhistogram

Propo

sed

Meniscalsurgery

60sec

60sec

1078%

0.57

0.68

0.64

0.78

00.78

forPCA-SIFT


For experimental work, we have used 20 videos of menis-cus surgery for human health services, and our evaluationshows that results gained using PCA-SIFT are more signif-icant. The results established using PCA-SIFT approachenhanced the obvious results with an average precisionof 0.78.

Our proposed system will help out the medication ofdiverse patients, and surgeons can especially get guidanceby watching the videos of surgeries performed by experts.Various datasets of real meniscal surgery in our system willbe explored in the future for testing purposes and evaluation.

Conflicts of Interest

The authors declare that they have no conflicts of interest.

Acknowledgments

This research was supported by the MIST (Ministry ofScience and ICT), Korea, under the National Programfor Excellence in SW supervised by the IITP (Institute forInformation & Communications Technology Promotion)(2015-0-00938).

References

[1] J. B. Thorlund, K. B. Hare, and L. S. Lohmander, “Largeincrease in arthroscopic meniscus surgery in the middle-agedand older population in Denmark from 2000 to 2011,” ActaOrthopaedica, vol. 85, no. 3, pp. 287–292, 2014.

[2] K. Charriere, G. Quellec, M. Lamard, G. Coatrieux,B. Cochener, and G. Cazuguel, “Automated surgical step rec-ognition in normalized cataract surgery videos,” in 2014 36thAnnual International Conference of the IEEE Engineering inMedicine and Biology Society, pp. 4647–4650, Chicago, IL,USA, 2014.

[3] P. Over, J. Fiscus, G. Sanders et al., “TRECVID 2014 – anoverview of the goals, tasks, data, evaluation mechanismsand metrics,” in Proceedings of TRECVid, p. 52, Orlando, FL,USA, 2014, NIST.

[4] S. T. Ramya, P. Rangarajan, and V. Gomathi, “XML basedapproach for object-oriented medical video retrieval usingneural networks,” Journal of Medical Imaging and HealthInformatics, vol. 6, no. 3, pp. 794–801, 2016.

[5] K. Schoeffmann, C. Beecks, M. Lux, M. S. Uysal, and T. Seidl,“Content-based retrieval in videos from laparoscopic surgery,”in Medical Imaging 2016: Image-Guided Procedures, RoboticInterventions, and Modeling, vol. 9786, pp. 1–10, San Diego,CA, USA, 2016.

[6] C. Beecks, K. Schoeffmann, M. Lux, M. S. Uysal, and T. Seidl,“Endoscopic video retrieval: a signature-based approach forlinking endoscopic images with video segments,” in 2015 IEEEInternational Symposium on Multimedia (ISM), pp. 33–38,Miami, FL, USA, 2015.

[7] J. R. Carlos, M. Lux, X. Giro-i-Nieto, P. Munoz, andN. Anagnostopoulos, “Visual information retrieval in endo-scopic video archives,” in 2015 13th International Workshopon Content-Based Multimedia Indexing (CBMI), pp. 1–6,Prague, Czech Republic, 2015, IEEE.

[8] K. Charriere, G. Quelled, M. Lamard et al., “Real-time multi-level sequencing of cataract surgery videos,” in 2016 14th

International Workshop on Content-Based Multimedia Index-ing (CBMI), pp. 1–6, Bucharest, Romania, 2016, IEEE.

[9] K. Muhammad, M. Sajjad, M. Y. Lee, and S. W. Baik, “Efficientvisual attention driven framework for key frames extractionfrom hysteroscopy videos,” Biomedical Signal Processing andControl, vol. 33, pp. 161–168, 2017.

[10] G. Quellec, M. Lamard, Z. Droueche, B. Cochener, C. Roux,and G. Cazuguel, “A polynomial model of surgical gesturesfor real-time retrieval of surgery videos,” in Medical Content-Based Retrieval for Clinical Decision Support, vol. 7723 of Lec-ture Notes in Computer Science, , pp. 10–20, Springer, 2013.

[11] C. A. Kelsey, P. H. Heintz, D. J. Sandoval, G. D. Chambers,N. L. Adolphi, and K. S. Paffett, “Radiation interactions withtissue,” in Radiation Biology of Medical Imaging, pp. 61–79,John Wiley & Sons, Inc., 2014.

[12] D. Markonis, M. Holzer, F. Baroz et al., “User-oriented evalu-ation of a medical image retrieval system for radiologists,”International Journal of Medical Informatics, vol. 84, no. 10,pp. 774–783, 2015.

[13] S. W. Zhang, Y. J. Shang, and L. Wang, “Plant disease recogni-tion based on plant leaf image,” The Journal of Animal & PlantSciences, vol. 25, no. 3, Supplement 1, pp. 42–45, 2015.

[14] J. Choi, Z. Wang, S. C. Lee, and W. J. Jeon, “A spatio-temporalpyramid matching for video retrieval,” Computer Vision andImage Understanding, vol. 117, no. 6, pp. 660–669, 2013.

[15] H. Bai, Y. Dong, S. Cen et al., Orange Labs Beijing(FTRDBJ) atTRECVid 2013: Instant Search, International Journal of Multi-media Information Retrieval, 2013.

[16] S. Bharathi, D. Shenoy, R. Venugopal, and M. Patnaik,“Ensemble PHOG and SIFT features extraction techniques toclassify high resolution satellite images,” Data Mining andKnowledge Engineering, vol. 6, no. 5, pp. 199–206, 2014.

[17] A. Bruhn, J. Weickert, and C. Schnörr, “Lucas/Kanade meetsHorn/Schunck: combining local and global optic flowmethods,” International Journal of Computer Vision, vol. 61,no. 3, pp. 1–21, 2005.

[18] Z. Droueche, M. Lamard, G. Cazuguel, G. Quellec, C. Roux,and B. Cochener, “Motion-based video retrieval with applica-tion to computer-assisted retinal surgery,” in 2012 AnnualInternational Conference of the IEEE Engineering in Medicineand Biology Society, pp. 4962–4965, San Diego, CA, USA,2012.

[19] Z. Droueche, G. Quellec, M. Lamard, G. Cazuguel,B. Cochener, and C. Roux, “Computer-aided retinal surgeryusing data from the video compressed stream,” InternationalJournal of Image and Video Processing: Theory and Applica-tion, vol. 1, no. 1, 2014.

[20] D. G. Lowe, “Distinctive image features from scale-invariantkeypoints,” International Journal of Computer Vision, vol. 60,no. 2, pp. 91–110, 2004.

[21] B. V. Patel and B. B. Meshram, “Content based video retrievalsystems,” International Journal of UbiComp, vol. 3, no. 2,pp. 13–30, 2012.

[22] B. Werner, S. Harald, and M. Roland, Semantic Indexing andInstance Search; Joanne Research at TRECVid, TRECVIDWorkshop, 2013.

[23] T. Tuytelaars and K. Mikolajczyk, “Local invariant featuredetectors: a survey,” Foundations and Trends® in ComputerGraphics and Vision, vol. 3, no. 3, pp. 177–280, 2008.

[24] MedlinePlusU.S. National Library of Medicine, Bethesda, MD,USA, April 2017 https://medlineplus.gov/surgeryvideos.html.

9Journal of Sensors

https://medlineplus.gov/surgeryvideos.html

[25] American Academy of Orthopaedic SurgeonsApril 2017,http://www.aaos.org/videogallery/.

[26] J. Davis and M. Goodrich, “The relationship betweenprecision-recall and ROC curves,” in Proceedings of the 23rdinternational conference on Machine learning - ICML '06,pp. 233–240, Pittsburgh, PA, USA, 2006.

[27] M. Powers, “Evaluation: from precision, recall and F-measureto ROC, informedness, markedness and correlation,” 2014,March 2016, https://medtube.net/orthopedics.


http://www.aaos.org/videogallery/

https://medtube.net/orthopedics

International Journal of

AerospaceEngineeringHindawiwww.hindawi.com Volume 2018

RoboticsJournal of

Hindawiwww.hindawi.com Volume 2018


Active and Passive Electronic Components

VLSI Design



Shock and Vibration


Civil EngineeringAdvances in

Acoustics and VibrationAdvances in



Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwww.hindawi.com

Volume 2018

Hindawi Publishing Corporation http://www.hindawi.com Volume 2013Hindawiwww.hindawi.com

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of


Hindawiwww.hindawi.com

Journal ofEngineeringVolume 2018

SensorsJournal of



RotatingMachinery


Modelling &Simulationin EngineeringHindawiwww.hindawi.com Volume 2018


Chemical EngineeringInternational Journal of Antennas and

Propagation




Navigation and Observation


Hindawi

www.hindawi.com Volume 2018

Advances in

Multimedia

Submit your manuscripts atwww.hindawi.com

https://www.hindawi.com/journals/ijae/

https://www.hindawi.com/journals/jr/

https://www.hindawi.com/journals/apec/

https://www.hindawi.com/journals/vlsi/

https://www.hindawi.com/journals/sv/

https://www.hindawi.com/journals/ace/

https://www.hindawi.com/journals/aav/

https://www.hindawi.com/journals/jece/

https://www.hindawi.com/journals/aoe/

https://www.hindawi.com/journals/tswj/

https://www.hindawi.com/journals/jcse/

https://www.hindawi.com/journals/je/

https://www.hindawi.com/journals/js/

https://www.hindawi.com/journals/ijrm/

https://www.hindawi.com/journals/mse/

https://www.hindawi.com/journals/ijce/

https://www.hindawi.com/journals/ijap/

https://www.hindawi.com/journals/ijno/

https://www.hindawi.com/journals/am/

https://www.hindawi.com/

https://www.hindawi.com/

Documents

Video Retrieval System for Meniscal Surgery to …downloads.hindawi.com/journals/js/2018/4390703.pdfResearch Article Video Retrieval System for Meniscal Surgery to Improve Health Care