Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Research ArticleVideo Retrieval System for Meniscal Surgery to Improve HealthCare Services
Sana Amanat,1 Muhammad Idrees,2 Muhammad Usman Ghani Khan,1 Zahoor Rehman ,3
Hangbae Chang,4 Irfan Mehmood ,5 and Sung Wook Baik5
1Department of Computer Science and Engineering, University of Engineering and Technology Lahore, Lahore, Pakistan2Department of Computer Science and Engineering, University of Engineering and Technology Lahore Narowal Campus,Lahore, Pakistan3Department of Computer Science, Comsats Institute of Information Technology Attock Campus, Attock, Pakistan4Department of Industrial Security, College of Business and Economics, Chung-Ang University, Seoul, Republic of Korea5Department of Software, Sejong University, Seoul, Republic of Korea
Correspondence should be addressed to Irfan Mehmood; [email protected]
Received 18 January 2018; Accepted 29 April 2018; Published 14 June 2018
Academic Editor: Ka L. Man
Copyright © 2018 Sana Amanat et al. This is an open access article distributed under the Creative Commons Attribution License,which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Meniscal surgery is considered the most general orthopedic process that deals with the treatment of meniscus tears for humanhealth care. It leads to a communal contusion to the cartilage that stabilizes and cushions the knee joints of human beings. Suchtears can be classified into different categories based on age group, region, and occupation. Further, a large number ofsportsmen and heavy weightlifters even in developed countries are affected by meniscus injuries. These patients are subjected toarthroscopic surgery, and during surgical treatment, the perseverance of meniscus is a very crucial task. Current researchprovides a significant ratio of meniscal tear patients around the globe, the critical expanse is considered as having strikinglyrisen with a mean annual of 0.066% due to surgery failure. To decumbent this ratio, an innovative training mechanism isproposed through video retrieval system in this research. This research work is focussed on developing a corpus and videoretrieval system for meniscus surgery. Using the proposed system, surgeons can access guidance by watching the videos ofsurgeries performed by an expert and their seniors. The proposed system is comprised of four approaches to the spatiotemporalmethodology to improve health care services. It entails key point, statistical modeling, PCA-scale invariant feature transform(SIFT), and PCA-Gaussian mixture model (GMM) with a combination of sparse-optical flow. The real meniscal surgery datasetis used for testing purposes and evaluation. The results conclude that using PCA-SIFT approach improves the results with anaverage precision of 0.78.
1. Introduction
Ameniscus tear is a public discolouration of the cartilage thatalleviates and pads the knee joints of human beings. This iscommon malfunctioning across the globe, with a meanannual rate of 0.066%. It is noteworthy to mention that alarge number of meniscus injury incidences are observed atabout 66 per 100,000 around the globe. These incidencescan be categorized under various aspects, that is, age group,region, and occupation. Most of the sportsmen and heavyweightlifters affected with age less than 40 years in developedcountries recovered using meniscus surgery. They cover the
areas of America, Latin, Asia, and Africa. On the other handin Denmark, more degenerative changes in the annual inci-dence are observed. These changes are increasing from 164to 312 over the past 10 years and about 2 million casesobserved in meniscus surgery treatment. Details of regis-tered patients, diagnosis, and procedures maintained byDanish National Patient Register can be found in [1] andSummit (2016).
Previous studies have reported a very high rate of repeti-tive surgery due to the failure of meniscus surgery. Differentcauses are observed for the failure of meniscus surgery. Oneof the major causes is the inappropriate removal of meniscus
HindawiJournal of SensorsVolume 2018, Article ID 4390703, 10 pageshttps://doi.org/10.1155/2018/4390703
because unexperienced surgeons do not know how much orwhich part has to be removed properly. More precisely, itdeals with the interaction of tissues and radiation, whichmay be represented in the form of a static image and videoor recordings. These recordings can be used to train surgeonsand practitioners to perform successful surgeries. One way toaccess these recordings is through video retrieval system thataims to retrieve videos from a source of the multimediarepository. Upon surgeon’s request and need, video retrievalsystem can provide the most relevant and accurate videos ofmeniscal surgery. Likewise, there have been several studies inrelation to surgery, that is, for the eyes, lungs, heart, andkidney, but meniscal surgery holds a unique position sinceit is a commonality in most of the countries and in literatureneither corpus nor video retrieval systems precisely formeniscal surgery yet found [2].
To retrieve retinal surgery videos using compressed videostreams, the required motion info is extracted at regularintervals. Precisely, two types of motion information areexcerpted: one is region trajectory and the other encompassesthe residual error. A heterogeneous feature vector is used tocombine the motion information, and sequence segmenta-tion is utilized to extract the motion information. TheKalman filter and generalized Gaussian distribution are uti-lized to formulate region trajectory and residual information.In last step, extension of fast dynamic time warping is alsoused along with feature vector to compare videos [3].
Another approach to retrieve medical videos is a keyframe which is initially extracted and its global features areexcerpted, which has to be saved in XML script format inthe form of an object to reduce the storage space. Further-more, neural network and naïve Bayes are used to enhancethe performance. Features of an image are extracted usingentropy, Huffman coding, and fast Fourier transforms [4].
In order to retrieve laparoscopy videos from a colossalrepository of videos, a feature signature is utilized which iscomposed of abstract and local details of the image. Initially,position, colour, and texture features of an image are consid-ered for this signature. Weight is assigned to each featuredescribing the potential to be used as a feature signature rep-resentative. For comparison between two signatures, earthmover’s distance, signature quadratic form distance, and sig-nature matching distance are used in combination [5]. Toretrieve endoscopy videos from repository, feature signatureis computed based on global features, that is, colour, position,and texture. Weight is assigned to each feature, and zero-weighted features are considered as irrelevant for signature.Those features have weighted value unequal to zero denotedas signature representatives. Furthermore, cluster centroid ofthese features is used as feature representative. To measurethe similarity between endoscopic images, signature match-ing distance and adaptive binning feature signature are usedby using the visual characteristics [6].
In another approach to retrieve endoscopic videos, local,global, and feature fusion techniques are employed. Toextract features, various algorithms CEDD, SURF, andSIMPLE are used. Additionally, early and late fusionapproaches are also considered. These approaches are testedon 1276 video clips. LIRE software library is used to index
test frames also capable to extract 20 visual features of animage [7].
To retrieve cataract surgery videos from the repository, avideo is divided into a number of overlapping clips. Each clipcontains the same number of frames. Various features areextracted from these frames using histogram of oriented gra-dients (HOG), BoVW, and motion histogram. These featuresare statistically modeled using the Bayesian network, CRFs,and HMMs. Thirty cataract surgery videos are used for thetesting purpose having 720× 576 pixels. A mean area underROC of 0.61 was achieved in recognizing surgical steps fromvideo stream [8]. In order to retrieve and analyse femalereproductive system through hysteroscopy, a video summari-zation framework is proposed. To compute feature set, col-our information is extracted from key frames using COCcolour model. Other salient features are also extracted suchas texture, motion, curvature, and contrast [9].
Another approach to retrieve cataract videos from thesource that formulate a feature set is by using global featuressuch as colour, texture, and motion information of the frame.Furthermore, the wavelet transform is performed on theseextracted channels of colour. In this study, cataract and epir-etinal surgeries are considered [10]. To retrieve similar videosin the domain of retinal surgery, a motion-based approachhas been found in literature which utilized MPEG-4 toextract features from the frame. For video encoding, a groupof pictures and macroblocks are used where three types offrames are considered. Motion histogram, classification,and motion trajectories are used to formulate motion featureset. Furthermore, signatures are also composed and tomeasure their ratio of similarity, fast dynamic time warpingapproach is used. This approach is applied to epiretinalmembrane surgery.
In order to control the failure rate of surgery and toovercome the mentioned drawbacks, there is a dire needfor a sophisticated approach. It could be possible via thefabrication of a connexion between information technol-ogy and medical sciences. Information technology is capa-ble of providing medical diagnosis, clinical analysis, andtreatment planning through a practice known as medicalimaging [11, 2, 12].
Our proposed system firstly focuses on spatial features,that is, the location of objects in reference to static objects[13]. Secondly, it deals with the temporal information of animage, which provides a glimpse of moving and operationalobjects [14]. This system is comprised mainly of fourmethods, that is, interest point, statistical modeling, PCA-SIFT, and PCA-GMM-based video retrieval. The first methodincorporates interest points of videos which are excerptedusing up-to-the-minute feature explication and indicatoralgorithms known as SIFT, HOG, and covariant featuredetector (CFD) with distance transform (DT) addressed byBai et al. [15] and Bharathi et al. [16]. The second methodencompasses statistical modeling of features using GMMand Kullback-Leibler divergence. Moreover, in the third andfourth method, dimension reduction technique is integratedwith SIFT and GMM, respectively. To extract the temporalfeatures, Lucas-Kanade sparse-optical flow is incorporatedin each method [17]. This proposed system can help new
2 Journal of Sensors
and less-experienced surgeons to get appropriate informationabout surgical methods, procedures, and tools to perform asuccessful surgery [11, 2, 12]. These meniscal recordings canadditionally be used for the explanation of the patients andtheir follow-up sessions [9].
Table 1 illustrates a summarized comparison of previousapproaches to retrieve videos. It is clear from this comparisonthat there are so many systems for medical video retrieval,but a video retrieval system for meniscal surgery practi-tioners is not found in spite of its need around the globe. Inthis system, we also have used a hybrid of state-of-the-arttechniques for feature detection, recognition, and compari-son. The most important point that can be concluded in thissummarization is a step towards new research direction.
2. Materials and Methods
This section sheds light on the proposed methodology formeniscal surgery video retrieval. The surgical video is consid-ered to be composed of frames which are chronologicallybound together to define a tale, that is, live surgical process.Each frame grasps an instant of surgery, producing a coher-ent association with contiguous frames. Algorithms in thisframework are applied to the frame instead of a video becausethe frame is a building block of a video. Therefore, thesevideos are decomposed into frames and shot boundary isdetected using XOR. As a result, the collections of masterframes depicting master shot are extracted. These masterframes are further utilized to model spatiotemporal featuresof a particular video. Detail of spatiotemporal feature extrac-tion and modeling technique is as follows.
2.1. Spatiotemporal Modeling Using Interest Points. Theinterest point represents the identification of interest or keypoints within an image used for further processing. Adescription of the interest point is as follows:
(i) It has a demarcated location and mathematicaldescription of an image.
(ii) Local image organization is high in the context oflocal information content.
(iii) In the image domain, it has steady trepidation ofboth local and global.
2.2. SIFT and CFD-Based Approach. SIFT is used to detectinterest points using various scales. With the help of Gauss-ian filters, the change in succeeding Gaussian-blurred imagesis figured out to excerpt interest points [17]. A difference ofGaussian (DoG) image DoG a, b, Δ of an original imageIm a, b is given by
DoG a, b, Δ = L a, b, LiΔ − L a, b, LjΔ , 1
where L a, b, LΔ describes the convolution operation of theoriginal image Im a, b with Gaussian blur G a, b, LΔ , atscale LΔ. [19].
CFD uses an affine adaptation to approximate the figureof an affine region in order to compute covariant features
of an image. With the help of affine adaptation, interestpoints can be calculated as follows [10]:
Cfd U =Deaa U Deab U
Deab U Debb U, 2
where Deaa U is defined as the 2nd fractional derivative inthe direction a and Deab U represents the mix fractional2nd derivative in directions of a and b. These computed fea-tures of input and target images are matched using Euclideandistance to get the value of similarity [20]. These calculateddistances are sorted to make the ranked list based on similar-ity of input video with target videos.
2.3. HOG and Distance Transformation (DT). The HOGfeature detector uses edge directions to compute local objectappearance and shape within an object.
Each image is split into small cells, and histogram ofgradient directions is computed for all cells.
In order to improve accuracy, contrast normalization oflocal histograms is computed by measuring the intensityacross image block. Dalal-Triggs method can be used fornormalization such as [3].
f = N
N 22d
23
All histograms in a given block are represented by avector N, which is not normalized, a small constant value isrepresented by d and N k be its k-norm for k = 1, 2 [21].
DT [3] descriptor is also called distance map which labelsthe pixels by measuring distance with the support of Canny’sedge detector. A framework of key point base is present inFigure 1.
2.4. Statistics-Based Framework for Video Retrieval. Statisticsbased technique is an approach to scrutinize condense and tointerpret data. For this purpose, we have used GMM withKullback-Leibler divergence to model the spatiotemporalcontent of videos. Following is the description of a method.
2.5. Spatiotemporal Modeling through GMM. The surgicalfootage is an object which contains highly complex multidi-mensional contents based on time series folded data. Assumethat a video V is an arrangement of frames and has length l,v1,… , vn , with every frame vi, providing a sequential
pattern of an arrangement.A video frame is described by each element vi, and its
spatial features are modeled in d-dimensional space. At thisphase, is presents the spatial features of frame xi in d dimen-sion feature space. Spatial features of a frame constitute factsof locality with the magnitude of features, for example, col-ours of every pixel (red, green, and blue), and collectedobjects may be easily traced. In order to model RGB colourdistribution, pixel position can be stated as shown inFigure 2.
A spatiotemporal probabilistic model can be constructedthrough statistical observation of samples by using time andspace features. The sampling of the video is performed at a
3Journal of Sensors
Table1:Com
parisonof
prop
osed
metho
dology
withexisting
ones.
Reference
papers
Featureset
Featuredescriptor
Similarity/cluster/
classification
measurement
Dataset
Research
domain
Video
type
Drouecheetal.[18]
Amotionvector,residual
inform
ation
Kalman
filter,generalized
Gaussiandistribu
tion
Extension
offastdynamic
timewarping
(EFD
TW)
69video-recorded
retinalsurgery
steps,1400
video-recorded
cataractsurgery,and1707
movie
clipswithclassified
human
action
s
Eye
surgery
MPEG-4
AVC/H
.264
Ram
yaetal.[4]
Globalfeatures
Entropy,h
istogram
,frequency
Neuraln
etwork
15478XMLscriptsbasedon
key
fram
esMedicalvideos
mp4
Scho
effmannetal.[5]
Position,
colour,texture
N/A
Earth
mover’sdistance,
signaturequ
adraticform
distance,and
signature
matchingdistance
25stored
videofilesper
procedure
Laparoscop
icsurgery
N/A
Beecksetal.[6]
Spatialinformation(colou
r,coarseness,con
trastvalue)
Localfeature
descriptor
k-meancluster
25segm
entvideos
perprocedure,
resulting
intotal1276shortvideo
clipsrangingfrom
60secto
250sec
End
oscopic
videos
N/A
Carlosetal.[7]
Localand
globalfeatures,
featurefusion
SIMPLE
,CEDD,SURF
k-mean
1276
End
oscopy
videos
N/A
Charriere
etal.[8]
N/A
BoV
Wk-nearestneighb
our
(KNN),motionhistogram,
andhidd
enMarkovmod
el30
Cataractsurgery
DVform
at
Muh
ammad
etal.[9]
Colou
rop
ponent
colour
space(CoC
),curvature
map,texture
N/A
N/A
10Hysteroscop
yN/A
Quellecetal.[10]
Texture,colou
r,op
ticalflow
N/A
N/A
23retinal,100cataractsurgery,
69ho
llywoodmovie
Eye
surgery
MPEG2
Drouecheetal.[18]
Motionhistogram
Fastdynamictimewarping
k-mean,
Kalman
filter,and
extend
edof
fastdynamic
timewarping
23epiretinalsurgeryvideo
Epiretinal
mem
brane
surgery
MPEG-4
AVC/H
.264
Propo
sed
The
keypo
int,colour
SIFT
,HOG,D
T,C
FD,
PCA-SIFT,spatiotem
poral
featurevector
GMM,E
uclid
eandistance,
andKullback-Leibler
divergence
20meniscussurgery
Meniscus
surgery
Flv,Avi,m
p4
4 Journal of Sensors
presumed rate, preserving temporal enslavement by theiradjoining images. Spatial features of every image depend onthe spatial context of every image and may depend on its for-mer image, which is known as spatiotemporal association.
Furthermore, GMM [22] is applied to an approximatelikelihood density of collected items. GMM is an unsuper-vised learning approach in which probabilistic model is builtwith the assumption that data points are produced by anassortment of a finite amount of Gaussian distributionsthrough unidentified parameters.
Moreover, a d-dimensional spatial element for theimage xi may be demonstrated through GMM along withK quantities and a presumed number of a frame structureis represented by K , for instance, the numeral core objectsinstituted in particular section. By using MLE, an associationcan be computed through the probability of spatiotemporalcontinuousness amongst two successive images as presentedin Figure 3.
We have acquired Gaussians against input and outputvideos after applying maximum likelihood as shown inFigures 4 and 5. By analyzing data distribution of theseGaussians, we can determine that dissimilarity is higherbetween input and output. To measure the dissimilarityratio, k-Lealer divergence, also known as relative entropy,was used.
2.6. Compression-Based Video Retrieval. In order to reduceinsignificance and redundancy, datum compression tech-nique can be used. The intention of compression is to cutdown the data dimensions for effective storage, retrieval,and manipulation. For this purpose, principal componentsanalysis (PCA) is used to the significant components fromdata that immaculately defines the entire image. Moreover,a hybrid approach is applied by the combination of PCAalong with SOFT and GMM to improve the precision ofretrieval method. Details of the method are as follows.
2.6.1. Summary of PCA. Principal component analysis is thedimensional reduction method which calculates linear pro-jections to maintain maximum variance in original data.
Query image
SIFT + covariant feature detector
HOG + distance transform
Distance transform
Feature descriptors
Input imageFeature matching
Resultant videos
Figure 1: Framework of key-point-based retrieval.
Figure 2: Temporal connection is created to form continuitybetween the consecutive frames.
S1 Si Sn
�휃1 �휃i ⋯ �휃n
f (Si | �휃 Sj)
Spatialtemporal context(location, magnitude, and time)
Spatiotemporalcorrelation
⋯
Figure 3: Spatiotemporal probabilistic model for a video framesequence x1,… , xn .
0.06
0.04
0.02
0
0.1
0.05
00 50 100 150 200 0 2 4 6 ×105
×10−6
0 100 200 300 0 100 200 300
0.06
0.04
0.02
0
4
3
2
1
0
Figure 4: GMM of an input query.
5Journal of Sensors
Eigenvectors to Eigenvalues λ of a matrix of the covarianceEx of datum D ∈ RdxM where d defines the dimensions ofthe original data and M shows the total count of points.
Exa = λa 4
By visualizing the point xi via projecting
yi = Axis 5
A defines a matrix having eigenvectors associated with 2or 3 of the biggest eigenvalues, and Yi is a calculated lowerdimension sketch of Xi. PCA is the most commonly usedmethod to visualize new data. Images produced by PCA arevery easy to deduce and also works moderately well evenwhen the difference in data stands primarily intense insolitary limited directions [23].
2.6.2. PCA-SIFT-Based Video Retrieval. In this method, PCAis applied on SIFT which is highly vigorous to image distor-tion. In PCA-SIFT firstly computes an Eigenspace to definean inline image of the native patch. In the second step, thelocal image gradient is calculated on a given patch. In thethird step, it projects gradient image vector by the Eigenspaceto originate the dense feature trajectory. That feature trajec-tory is expressively negligible than SIFT feature trajectoryandmay be used with the identical matching method. Euclid-ean distances amongst binary feature trajectories can be usedto define either these two trajectories resemble identicalinterest point in dissimilar frames. The following are thesteps to retrieve videos using PCA-SIFT approach.
(1) Input shot is transformed to image.
(2) The master image is extracted using shot boundarydetection.
(3) For each mater frame, PCA-SIFT is calculated.
(4) Steps from 1–3 are repeated for target videos from1–N .
(5) After step 4, two feature vectors were gotten, that is,for input and target videos [1 –N].
(6) The feature vector of input and target videos iscompared using Euclidean distance to computethe similarity.
2.6.3. PCA-GMM-Based Video Retrieval. The analyses ofFigures 4 and 5 show that spatial context is comprised oflocation and magnitude features against each pixel such asRGB colour channel. These colour channels are extractedfrom input and target videos and passed through dimensionreduction process using PCA. These compressed channelsare modeled using GMM for spatiotemporal analysis. Fur-thermore, Kullback-Leibler divergence is applied in similarway as presented in Section 3.2.
2.6.4. Temporal Points Extraction from Surgical Videos bySparse-Optical Flow. Surgical video is a dynamic entity inwhich the motion of an object plays a vital role in under-standing an ongoing surgical process. To capture the move-ment or motion of surgical procedure, we have used opticflow method, which is an arrangement of moving object inthe photographic sight prompted through a relative move-ment amongst a scene plus an eye, that is, camera. TheLucas-Kanade method was used to capture the motion ofobjects within a video of live surgery [17].
3. Results and Discussion
In this section, we present experimental settings includingthe data set and evaluation method.
3.1. Dataset. Datasets are comprised of 20 real meniscus sur-gical videos collected from various online publicly availablerepositories MedlinePlus [24]; AAOS [25]; vjortho (2017);and Northgate (2017). These shots are sufficient and benefi-cial in the current scenario because it covers a wide rangeof multiple scenarios and uniquely identifies a number ofinstances. It also provides a higher degree of relevance withreference to its surrounding and properly differentiates fromthe nonadjacent objects. Furthermore, it is clear from Table 1that for testing purposes, the state-of-the-art systems haveconsidered a small testing dataset. However, for result evalu-ation of a proposed system, we have requested three surgeonsto choose the ground truth from our dataset. A ground truthis selected in terms of video, a relevant approach, and amechanism can also be found in [9]. Sample frames frommeniscal surgery are shown in Figures 6–8.
3.2. Tool. To precede the evaluation phase, an input video isprovided to our proposed system. Its spatiotemporal featuresare computed against 1–N targeted surgical videos, and themost relevant videos are displayed in the right side of theinterface as shown in Figure 9.
3.3. Evaluation Measurement. For the purposes of a systemevaluation, there are many parameters such as precision,
×10−6
×105
0.06
0.04
0.02
0−50 0 50 100 150 0 50 100 150
0 50 100 150
0.06
0.04
0.02
0
0.1
0.05
00 2 4 6
0
1
2
3
4×10−6
50 0 50 100 150 0 50 100 15
0.06
0.04
0.02
0
0
1
2
3
4
Figure 5: GMM of the output query.
6 Journal of Sensors
recall, and ROC area. But in the information retrieval system,precision metric is used because it provides the proportion ofretrieved videos that are actually relevant. In order to mea-sure the precision of our system, we have split the video intosmall segments and these segments are further comparedwith the ground truth. Precision can be computed using thefollowing metric [26, 27].
Precision =TPTP
+ FP 6
Herein, a video is said to be the true positive (TP) if it isselected by both our system and the surgeon, or said to befalse positive (FP) if it is selected by our system but not bythe surgeon. Using this scheme, we measure the average pre-cision of our system using our four proposed methods asshown in Table 2. It has already been mentioned that videoretrieval system specifically for meniscus surgery was notyet found in the literature. Therefore, we have considered anumber of general parameters for video retrieval system forresult comparison. A brief comparison of medical videoretrieval system and our proposed system is shown inTable 2.
We can deduce from Table 2 that PCA-SIFT from ourproposed method performs well. Observable key factors ofthis dominant performance are that the incline areas nearbySIFT points are extremely organized to be presented throughPCA. Eigenvalues against the patch decline are considerablyfaster than Edigenvalues for arbitrarily designated inclinepatches. For the reason that patches adjoining points entirelyshare definite characteristics, reducing since SIFT point rec-ognition stage; they are totally adjusted, interchanged to aligncentral inclines with vertical, and are scaled correctly. Thatstreamlines the trade PCA necessity do and enabling thePCA-SIFT to exactly characterize patch by the small numberof dimensions along with Lucas-Kanade optical flowmethod.
4. Conclusion
Our proposed video retrieval system for meniscal surgerymodeling is based on four methods. These methods includekey point-based retrieval, statistical retrieval, PCA-SIFT-based retrieval, and PCA-GMM-based retrieval. They furtherexploit spatiotemporal features for the improvement ofhealth services to human beings.
In a key point-based retrieval method, key points of videoare extracted using SIFT,HOG,CFD, andDT. Secondly, usingthe statistical retrievalmethod, spatial and temporal attributesare statistically modeled using GMM and Kullback-Leiblerdivergence. Moreover, by using PCA-SIFT-based retrievaland PCA-GMM-based retrieval methods, the dimensionreduction technique is used with SOFT and GMM, respec-tively. To track spatiodevelopment over temporal interims,Lucas-Kanade sparse-optical flow method was used, whichis efficient and robust to noise.
Figure 7: Frame extracted from reconstruction surgery ofmeniscal (2).
Figure 8: Frame extracted from reconstruction surgery ofmeniscal (3).
Figure 9: Sample screenshot of relevant videos using HOGand CFD.
Figure 6: Frame extracted from reconstruction surgery ofmeniscal (1).
7Journal of Sensors
Table2:Com
parative
analysisof
prop
osed
system
withalreadyexisting
system
s.
Dataset
Clip
size
Average
retrievaltim
eAverage
numberof
match
fram
es/
videos
Recall
Average
precision
ROC
Description
1Medicalvideos
&classified
human
action
Avg
20sec
7min
3sec
00%
0.43
0Tim
erequ
ired
tocompu
tefeaturevector
for9videos
0.43
incase
ofERM
andCDdataset
2Medicalvideos
1.97
min
andabove
16.4(retrievaltimeforall
attributes)
111
0%0.983
0
Precisionisbasedon
8videos
usingthe
neuralnetworkin
term
sof
relevant
key
fram
eretrievaland
atotaln
umberof
key
fram
esforthat
video
3Laparoscop
icvideos
00
80%
00
N/A
4End
oscopicvideos
60secto
250sec
00
88%
00
N/A
5End
oscopy
00
00%
0.78
0N/A
6Cataract
00
00%
0.7
Withmotionanalysisand0.9withthe
presence
ofsurgicalprocess
7Hysteroscop
y2to
3min
00
0%0.92
0
8Eye
surgery
00
00%
0.72
0In
thecase
ofCSD
dataset
9Epiretinalsurgery
00
00%
0.62
00.69
incase
ofmotionhistogram
Propo
sed
Meniscalsurgery
60sec
60sec
1078%
0.57
0.68
0.64
0.78
00.78
forPCA-SIFT
8 Journal of Sensors
For experimental work, we have used 20 videos of menis-cus surgery for human health services, and our evaluationshows that results gained using PCA-SIFT are more signif-icant. The results established using PCA-SIFT approachenhanced the obvious results with an average precisionof 0.78.
Our proposed system will help out the medication ofdiverse patients, and surgeons can especially get guidanceby watching the videos of surgeries performed by experts.Various datasets of real meniscal surgery in our system willbe explored in the future for testing purposes and evaluation.
Conflicts of Interest
The authors declare that they have no conflicts of interest.
Acknowledgments
This research was supported by the MIST (Ministry ofScience and ICT), Korea, under the National Programfor Excellence in SW supervised by the IITP (Institute forInformation & Communications Technology Promotion)(2015-0-00938).
References
[1] J. B. Thorlund, K. B. Hare, and L. S. Lohmander, “Largeincrease in arthroscopic meniscus surgery in the middle-agedand older population in Denmark from 2000 to 2011,” ActaOrthopaedica, vol. 85, no. 3, pp. 287–292, 2014.
[2] K. Charriere, G. Quellec, M. Lamard, G. Coatrieux,B. Cochener, and G. Cazuguel, “Automated surgical step rec-ognition in normalized cataract surgery videos,” in 2014 36thAnnual International Conference of the IEEE Engineering inMedicine and Biology Society, pp. 4647–4650, Chicago, IL,USA, 2014.
[3] P. Over, J. Fiscus, G. Sanders et al., “TRECVID 2014 – anoverview of the goals, tasks, data, evaluation mechanismsand metrics,” in Proceedings of TRECVid, p. 52, Orlando, FL,USA, 2014, NIST.
[4] S. T. Ramya, P. Rangarajan, and V. Gomathi, “XML basedapproach for object-oriented medical video retrieval usingneural networks,” Journal of Medical Imaging and HealthInformatics, vol. 6, no. 3, pp. 794–801, 2016.
[5] K. Schoeffmann, C. Beecks, M. Lux, M. S. Uysal, and T. Seidl,“Content-based retrieval in videos from laparoscopic surgery,”in Medical Imaging 2016: Image-Guided Procedures, RoboticInterventions, and Modeling, vol. 9786, pp. 1–10, San Diego,CA, USA, 2016.
[6] C. Beecks, K. Schoeffmann, M. Lux, M. S. Uysal, and T. Seidl,“Endoscopic video retrieval: a signature-based approach forlinking endoscopic images with video segments,” in 2015 IEEEInternational Symposium on Multimedia (ISM), pp. 33–38,Miami, FL, USA, 2015.
[7] J. R. Carlos, M. Lux, X. Giro-i-Nieto, P. Munoz, andN. Anagnostopoulos, “Visual information retrieval in endo-scopic video archives,” in 2015 13th International Workshopon Content-Based Multimedia Indexing (CBMI), pp. 1–6,Prague, Czech Republic, 2015, IEEE.
[8] K. Charriere, G. Quelled, M. Lamard et al., “Real-time multi-level sequencing of cataract surgery videos,” in 2016 14th
International Workshop on Content-Based Multimedia Index-ing (CBMI), pp. 1–6, Bucharest, Romania, 2016, IEEE.
[9] K. Muhammad, M. Sajjad, M. Y. Lee, and S. W. Baik, “Efficientvisual attention driven framework for key frames extractionfrom hysteroscopy videos,” Biomedical Signal Processing andControl, vol. 33, pp. 161–168, 2017.
[10] G. Quellec, M. Lamard, Z. Droueche, B. Cochener, C. Roux,and G. Cazuguel, “A polynomial model of surgical gesturesfor real-time retrieval of surgery videos,” in Medical Content-Based Retrieval for Clinical Decision Support, vol. 7723 of Lec-ture Notes in Computer Science, , pp. 10–20, Springer, 2013.
[11] C. A. Kelsey, P. H. Heintz, D. J. Sandoval, G. D. Chambers,N. L. Adolphi, and K. S. Paffett, “Radiation interactions withtissue,” in Radiation Biology of Medical Imaging, pp. 61–79,John Wiley & Sons, Inc., 2014.
[12] D. Markonis, M. Holzer, F. Baroz et al., “User-oriented evalu-ation of a medical image retrieval system for radiologists,”International Journal of Medical Informatics, vol. 84, no. 10,pp. 774–783, 2015.
[13] S. W. Zhang, Y. J. Shang, and L. Wang, “Plant disease recogni-tion based on plant leaf image,” The Journal of Animal & PlantSciences, vol. 25, no. 3, Supplement 1, pp. 42–45, 2015.
[14] J. Choi, Z. Wang, S. C. Lee, and W. J. Jeon, “A spatio-temporalpyramid matching for video retrieval,” Computer Vision andImage Understanding, vol. 117, no. 6, pp. 660–669, 2013.
[15] H. Bai, Y. Dong, S. Cen et al., Orange Labs Beijing(FTRDBJ) atTRECVid 2013: Instant Search, International Journal of Multi-media Information Retrieval, 2013.
[16] S. Bharathi, D. Shenoy, R. Venugopal, and M. Patnaik,“Ensemble PHOG and SIFT features extraction techniques toclassify high resolution satellite images,” Data Mining andKnowledge Engineering, vol. 6, no. 5, pp. 199–206, 2014.
[17] A. Bruhn, J. Weickert, and C. Schnörr, “Lucas/Kanade meetsHorn/Schunck: combining local and global optic flowmethods,” International Journal of Computer Vision, vol. 61,no. 3, pp. 1–21, 2005.
[18] Z. Droueche, M. Lamard, G. Cazuguel, G. Quellec, C. Roux,and B. Cochener, “Motion-based video retrieval with applica-tion to computer-assisted retinal surgery,” in 2012 AnnualInternational Conference of the IEEE Engineering in Medicineand Biology Society, pp. 4962–4965, San Diego, CA, USA,2012.
[19] Z. Droueche, G. Quellec, M. Lamard, G. Cazuguel,B. Cochener, and C. Roux, “Computer-aided retinal surgeryusing data from the video compressed stream,” InternationalJournal of Image and Video Processing: Theory and Applica-tion, vol. 1, no. 1, 2014.
[20] D. G. Lowe, “Distinctive image features from scale-invariantkeypoints,” International Journal of Computer Vision, vol. 60,no. 2, pp. 91–110, 2004.
[21] B. V. Patel and B. B. Meshram, “Content based video retrievalsystems,” International Journal of UbiComp, vol. 3, no. 2,pp. 13–30, 2012.
[22] B. Werner, S. Harald, and M. Roland, Semantic Indexing andInstance Search; Joanne Research at TRECVid, TRECVIDWorkshop, 2013.
[23] T. Tuytelaars and K. Mikolajczyk, “Local invariant featuredetectors: a survey,” Foundations and Trends® in ComputerGraphics and Vision, vol. 3, no. 3, pp. 177–280, 2008.
[24] MedlinePlusU.S. National Library of Medicine, Bethesda, MD,USA, April 2017 https://medlineplus.gov/surgeryvideos.html.
9Journal of Sensors
[25] American Academy of Orthopaedic SurgeonsApril 2017,http://www.aaos.org/videogallery/.
[26] J. Davis and M. Goodrich, “The relationship betweenprecision-recall and ROC curves,” in Proceedings of the 23rdinternational conference on Machine learning - ICML '06,pp. 233–240, Pittsburgh, PA, USA, 2006.
[27] M. Powers, “Evaluation: from precision, recall and F-measureto ROC, informedness, markedness and correlation,” 2014,March 2016, https://medtube.net/orthopedics.
10 Journal of Sensors
International Journal of
AerospaceEngineeringHindawiwww.hindawi.com Volume 2018
RoboticsJournal of
Hindawiwww.hindawi.com Volume 2018
Hindawiwww.hindawi.com Volume 2018
Active and Passive Electronic Components
VLSI Design
Hindawiwww.hindawi.com Volume 2018
Hindawiwww.hindawi.com Volume 2018
Shock and Vibration
Hindawiwww.hindawi.com Volume 2018
Civil EngineeringAdvances in
Acoustics and VibrationAdvances in
Hindawiwww.hindawi.com Volume 2018
Hindawiwww.hindawi.com Volume 2018
Electrical and Computer Engineering
Journal of
Advances inOptoElectronics
Hindawiwww.hindawi.com
Volume 2018
Hindawi Publishing Corporation http://www.hindawi.com Volume 2013Hindawiwww.hindawi.com
The Scientific World Journal
Volume 2018
Control Scienceand Engineering
Journal of
Hindawiwww.hindawi.com Volume 2018
Hindawiwww.hindawi.com
Journal ofEngineeringVolume 2018
SensorsJournal of
Hindawiwww.hindawi.com Volume 2018
International Journal of
RotatingMachinery
Hindawiwww.hindawi.com Volume 2018
Modelling &Simulationin EngineeringHindawiwww.hindawi.com Volume 2018
Hindawiwww.hindawi.com Volume 2018
Chemical EngineeringInternational Journal of Antennas and
Propagation
International Journal of
Hindawiwww.hindawi.com Volume 2018
Hindawiwww.hindawi.com Volume 2018
Navigation and Observation
International Journal of
Hindawi
www.hindawi.com Volume 2018
Advances in
Multimedia
Submit your manuscripts atwww.hindawi.com