73
Caio César Sabino Silva MOTION COMPENSATED PERMUTATION-BASED VIDEO ENCRYPTION www.cin.ufpe.br/~posgraduacao RECIFE 2015

Caio César Sabino Silva - repositorio.ufpe.br · em termos de exploração de entropia de dados, entretanto nas últimas décadas houve um aumento significante no volume de dados

Embed Size (px)

Citation preview

Caio César Sabino Silva

MOTION COMPENSATED PERMUTATION-BASED VIDEO

ENCRYPTION

Federal University of Pernambuco

[email protected]

www.cin.ufpe.br/~posgraduacao

RECIFE2015

Caio César Sabino Silva

MOTION COMPENSATED PERMUTATION-BASED VIDEOENCRYPTION

A M.Sc. Dissertation presented to the Center for Informatics

of Federal University of Pernambuco in partial fulfillment

of the requirements for the degree of Master of Science in

Computer Science.

Advisor: Tsang Ing Ren

Co-Advisor: George Darmiton da Cunha Cavalcanti

RECIFE2015

Catalogação na fonte

Bibliotecária Monick Raquel Silvestre da S. Portes, CRB4-1217

S586m Silva, Caio César Sabino

Motion compensated permutation-based video encryption / Caio César Sabino Silva. – 2015.

72 f.: il., fig., tab. Orientador: Tsang Ing Ren. Dissertação (Mestrado) – Universidade Federal de Pernambuco. CIn,

Ciência da Computação, Recife, 2015. Inclui referências.

1. Ciência da computação. 2. Segurança de dados multimídia. I. Ren, Tsang Ing (orientador). II. Título. 004 CDD (23. ed.) UFPE- MEI 2017-35

Dissertação de Mestrado apresentada por Caio César Sabino Silva à Pós-Graduação em

Ciência da Computação do Centro de Informática da Universidade Federal de

Pernambuco, sob o título “Motion Compensated Permutation-based Video

Encryption” orientada pelo Prof. Tsang Ing Ren e aprovada pela Banca Examinadora

formada pelos professores:

______________________________________________

Prof. Carlos Alexandre Barros de Mello

Centro de Informática / UFPE

______________________________________________

Profa. Vanessa Testoni

Sansung Research Brazil

_______________________________________________

Prof. Tsang Ing Ren

Centro de Informática / UFPE

Visto e permitida a impressão.

Recife, 25 de agosto de 2015.

_______________________________________

Profa. Edna Natividade da Silva Barros Coordenador da Pós-Graduação em Ciência da Computação do

Centro de Informática da Universidade Federal de Pernambuco.

I dedicate this dissertation to my mother.

Acknowledgements

I would like to thank everyone who assisted me directly or indirectly in the developmentof this dissertation project:

to my parents, José Antonio and Valdeci Sabino, above anything, for giving me all thesupport needed and for stimulating me into trying hard to do the best possible and not to give upon any circumstances.

to my brother, Eduardo, for helping me find interest in science, giving me the supportand the attention I needed and always keeping me pushing forward in whatever activity I wasinvolved in.

to professor Carlos Alexandre and researcher Vanessa Testoni for kindly accepting theinvitation to participate in the examining board. It is an honor for me to have both of youevaluating this work.

to my advisor Tsang Ing Ren, who helped me in so many ways not only in the mastersdegree program but also in the graduation process. I feel especially grateful to him, especiallysince he cares about me as his friend and not only as his student and I always appreciate hisopinions and career advices. He also motivates me to become more active in the researchingcommunity and always finds new interesting problems for us to work on.

to my co-advisor George Darmiton, who always had interesting insights about the workeven when it was not exactly the field he has been working on, and collaborated to the workdeveloped in this dissertation.

to my friend Lais Sousa, who started working on this research with me, when we werestill undergraduate students. I feel a big part of this work is also hers, since she collaborated tothis from the very beginning. I also appreciate all of her support and algorithmic visions andtheoretical insights, which contributed a lot to my Computer Science background.

to my friend Ruan Carvalho, who has always been with me through my undergraduateand masters program. I feel especially grateful to him for the countless times he went out of hisway just to help me. Also for helping me with my career decisions, which he always had an openmind to listen and advise me.

to my friends Amora, Anália and Lorena who have been quite close to me through thistime, holding my back and giving me friendly support. It feels good to know I have these friendsto count on if something bad happens. And it was always enjoyable to spend time with them,which helped relieve a lot of stress from working in this project at times.

Rather than love, than money, than fame, give me truth.

—HENRY DAVID THOREAU (WALDEN)

Resumo

No contexto de segurança de aplicações multimídia, técnicas de encriptação de vídeotêm sido desenvolvidas com o intuito de assegurar a confidencialidade das informações contidasem tal tipo de mídia. Compressão e encriptação costumavam ser consideradas áreas opostasem termos de exploração de entropia de dados, entretanto nas últimas décadas houve umaumento significante no volume de dados operado por aplicações de encriptação de vídeo, oque exigiu melhoras na compressão de vídeos encriptados. Neste sentido, diversas técnicastêm sido desenvolvidas como codificação de entropia provendo encriptação e compressãosimultaneamente.

Um esquema criptográfico existente, introduzido por Socek et al., é baseado em transfor-mações de permutação e aplica encriptação anteriormente à fase de compressão. A encriptaçãoaplicada por essa técnica pode ser considerada não tão segura quanto um esquema criptográficoconvencional, mas ainda aceitável pela maioria das aplicações de vídeo. A mesma é capaz de mel-horar a correlação espacial do vídeo original, caso os quadros consecutivos sejam suficientementesimilares, tornando-o possivelmente mais compressível que o vídeo original.

Entretanto, o esquema criptográfico original foi designado para explorar apenas corre-lação espacial de cada quadro, e codificadores podem explorar também correlação temporal nãotrivial. Além disso, as melhoras na correção espacial advindas das transformações de permutaçãosão altamente baseadas na correlação temporal natural do vídeo. Portanto, a performance do es-quema é extremamente associada à quantidade de movimento no vídeo. O trabalho desenvolvidonesta dissertação tem como objetivo estender esse esquema criptográfico, incluindo conceitos decompensação de movimento nas transformações baseadas em permutação usadas na encriptaçãode vídeo para melhorar sua performance, tornando o esquema mais resiliente a vídeos com muitomovimento.

Palavras-chave: Encriptação de vídeo. Codificação de vídeo. Segurança de dados multimídia.Compensação de movimento. Correlação de dados. Compressão

Abstract

In the context of multimedia applications security, digital video encryption techniqueshave been developed to assure the confidentiality of information contained in such mediatype. Compression and encryption used to be considered as opposite in terms of exploring thedata’s entropy, however in the last decades there was an increase of data volume operated byvideo encryption applications which demanded improvements on data compressibility in videoencryption. In this sense, many techniques have been developed as entropy coding providingboth encryption and compression simultaneously.

An existing cryptographic scheme, introduced by Socek et al., is based on permutationtransformations and applies encryption prior to the compression stage. The encryption appliedby this technique may not be as safe as a conventional encryption technique, but its security isstill considered acceptable for most video applications. It can improve the original data’s spatialcorrelation in case the consecutive frames are similar, making it possibly even more compressiblethan the original video.

However the original cryptographic scheme was designed to explore only the spatialcorrelation inside every frame, but codecs can also explore non-trivial temporal correlation. Alsothe improvements on the data’s spatial correlation coming from the permutation transformationsare highly based on the natural temporal correlation in the video. Hence its performanceis extremely associated to the amount of motion in the video. The work developed in thisdissertation aims to extend this cryptographic scheme, including motion compensation conceptsto the permutation based transformations used in the video encryption technique to improve itsperformance and make it more resilient to high motion videos.

Keywords: Video encryption. Video coding. Multimedia data security. Motion compensation.Data correlation. Compression

List of Figures

1.1 Cryptography scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1 Video notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.2 Example of image histogram plot . . . . . . . . . . . . . . . . . . . . . . . . 202.3 Spatial correlation illustration . . . . . . . . . . . . . . . . . . . . . . . . . . 222.4 Temporal correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.5 Video coding: frame types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242.6 Video coding scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 262.7 Permutation and compressibility . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.1 Overview of cryptographic scheme . . . . . . . . . . . . . . . . . . . . . . . 303.2 Block-based approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.3 Histogram hiding extension . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.1 Example of Three Step Search algorithm’s convergence: the numbers indicatethe center of candidate blocks at each execution step. . . . . . . . . . . . . . . 43

4.2 Example of Two Dimensional Logarithmic Search algorithm’s convergence: thenumbers indicate the center of candidate blocks at each execution step. . . . . 44

4.3 Linear motion estimation principle for frame prediction . . . . . . . . . . . . . 454.4 Example of block motion vector parameters in a frame . . . . . . . . . . . . . 464.5 Consecutive frames residual difference - Flower sequence . . . . . . . . . . . 484.6 Consecutive residual frames with motion compensation - Flower sequence . . 49

6.1 Frame examples of the video sequences dataset . . . . . . . . . . . . . . . . . 596.2 ‘Almost-sorting’ permutation quality in Flower sequence frame 4 . . . . . . . 636.3 Bitstream size by frame plot comparing original method versus motion compen-

sated extended algorithm (using TDL motion estimation). . . . . . . . . . . . . 646.4 PSNR-QP plots comparing the original Socek method with the extended motion

compensation encryption version for each high motion sequence . . . . . . . . 656.5 PSNR-QP plots comparing the original Socek method with the extended motion

compensation residual encryption version in each high motion sequence . . . . 67

List of Tables

6.1 Video sequences information . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.2 Bitstream size comparison of encrypted video (in KB) for different unique

sorting permutation algorithms for MPNG codec . . . . . . . . . . . . . . . . 616.3 Bitstream size comparison of encrypted video (in KB) for different unique

sorting permutation algorithms for H.264 codec with QP = 4 . . . . . . . . . . 616.4 Average PSNR comparison of encrypted video (in dB) for different unique

sorting permutation algorithms for H.264 codec with QP = 4 . . . . . . . . . . 616.5 Average MSSIM comparison of encrypted video for different unique sorting

permutation algorithms for H.264 codec with QP = 4 . . . . . . . . . . . . . . 626.6 Bitstream size comparison (in KB) for the motion compensation extension using

H.264 codec with QP = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.7 Average MSSIM comparison for the motion compensation extension using H.264

codec with QP = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 626.8 Average PSNR comparison (in dB) for the motion compensation extension using

H.264 codec with QP = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.9 Bitstream size reduction relative to Socek method for the extended methods in

high motion sequences for different motion estimation algorithms using H.264codec with QP = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.10 Bitstream size comparison (in KB) for the motion compensated histogram hidingextension encryption using H.264 codec with QP = 4 . . . . . . . . . . . . . . 65

6.11 Average MSSIM comparison for the motion compensated histogram hidingextension encryption using H.264 codec with QP = 4 . . . . . . . . . . . . . . 66

6.12 Average PSNR comparison (in dB) for the motion compensated histogram hidingextension encryption using H.264 codec with QP = 4 . . . . . . . . . . . . . . 66

6.13 Bitstream size reduction relative to Socek method for motion compensatedhistogram hiding encryption methods using H.264 codec with QP = 4 . . . . . 66

List of Algorithms

3.1 Unique sorting permutation method based on Quicksort . . . . . . . . . . . . . 313.2 Video encryption algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.3 Video decryption algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4 Constant camera translation adjustment for sorting permutation . . . . . . . . . 35

4.1 Unique sorting permutation method based on Counting Sort . . . . . . . . . . . 394.2 Motion compensated video encryption algorithm . . . . . . . . . . . . . . . . . 464.3 Motion compensated video decryption algorithm . . . . . . . . . . . . . . . . . 474.4 Motion compensated video encryption algorithm on the frame residuals . . . . . 504.5 Motion compensated video decryption algorithm on the frame residuals . . . . . 51

Contents

1 Introduction 141.1 Multimedia data security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.2 Video encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.3 Objective of the research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171.4 Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Digital video processing 192.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192.2 Video coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.2.1 Data redundancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.2.2 Video frame types . . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.2.3 Encoding and decoding module . . . . . . . . . . . . . . . . . . . . . 24

2.3 Permutations in digital videos . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Permutation-based video encryption 293.1 Cryptographic scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.1.1 Unique sorting permutation . . . . . . . . . . . . . . . . . . . . . . . 303.1.2 Encryption and decryption algorithms . . . . . . . . . . . . . . . . . . 31

3.2 Advanced extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2.1 Perception quality control . . . . . . . . . . . . . . . . . . . . . . . . 323.2.2 Handling constant camera translation . . . . . . . . . . . . . . . . . . 333.2.3 Histogram hiding method . . . . . . . . . . . . . . . . . . . . . . . . 343.2.4 Dealing with frame losses . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Extended encryption 384.1 Unique sorting permutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2 Motion sensitivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.3 Motion estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.4 Block-based motion estimation algorithms . . . . . . . . . . . . . . . . . . . . 41

4.4.1 Minimization of dissimilarity function . . . . . . . . . . . . . . . . . . 414.4.2 Three Step Search (TSS) algorithm . . . . . . . . . . . . . . . . . . . 424.4.3 Two Dimensional Logarithmic Search (TDL) algorithm . . . . . . . . 43

4.5 Motion compensated ‘almost-sorted’ permutation . . . . . . . . . . . . . . . . 444.6 Extended cryptographic scheme . . . . . . . . . . . . . . . . . . . . . . . . . 454.7 Histogram hiding extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.7.1 Encryption and decryption extended algorithms . . . . . . . . . . . . . 49

5 Cryptographic scheme analysis 525.1 Space and processing time analysis . . . . . . . . . . . . . . . . . . . . . . . . 525.2 Security analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.2.1 Brute force attack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555.2.2 Known-plaintext and chosen-plaintext attack . . . . . . . . . . . . . . 555.2.3 Chosen-cyphertext attack . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.3 Codec module limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

6 Experiments and results 576.1 Video quality and compressibility metrics . . . . . . . . . . . . . . . . . . . . 576.2 Video sequences dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586.3 Codec module configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . 606.4 Unique sorting permutation evaluation . . . . . . . . . . . . . . . . . . . . . . 606.5 Motion compensated encryption evaluation . . . . . . . . . . . . . . . . . . . 616.6 Motion compensated residual encryption evaluation . . . . . . . . . . . . . . . 64

7 Conclusion 68

References 70

141414

1Introduction

This chapter introduces the multimedia data security and encryption context, which isthe concern of the research developed in this dissertation. Besides, an overview of the scope ofthis work is presented and the objective of the research is highlighted.

1.1 Multimedia data security

In the latest years, there has been a considerable improvement in computer and net-working technologies which were responsible for providing simple methods for processing,distributing and storing of data to most user services and applications. However due to theopenness of wired and wireless networks, the data operated by such applications can be easilycopied or modified (JAKIMOSKI; SUBBALAKSHMI, 2008). In parallel to the growth of thesetechnologies, the emergence of digital rights management came as an important research areato guarantee the protection and authentication of copyrighted multimedia data (GRANGETTO;MAGLI; OLMO, 2006).

Cryptography is defined as the study of mathematical techniques related to informationsecurity aspects, such as confidentiality, data integrity and entity and data origin authenticity(MENEZES; VANSTONE; OORSCHOT, 1996). Hence, this research area is useful to guaranteethe secrecy of information that must be recognizable only by authorized individuals. This isusually done by applying a series of transformations in the original data to a format which isonly restorable by individuals holding a given secret key. The process which makes the dataillegible is referred to Encryption, while the reverse process to obtain the original message iscalled Decryption.

The scenario of cryptography can be demonstrated in Figure 1.1. Two entities (senderand receiver) attempt to stabilish a secure communication to transmit a message using a publicchannel. This channel is insecure and can be eavesdropped by an unauthorized entity, i.e.,attacker, who tries to recover the original message. The original message to be transmitted isdefined as plaintext while the encrypted message to be sent through the insecure channel isdenoted as cyphertext.

1.1. MULTIMEDIA DATA SECURITY 15

Figure 1.1: Cryptography scenario

In terms of information security, there are four main goals of cryptographic systems(MENEZES; VANSTONE; OORSCHOT, 1996):

Confidentiality : a service designed to avoid that the content of the informationbecomes legible to unauthorized users. The techniques to perform such functionalityusually are related to application of mathematical functions that make the dataunreadable.

Data Integrity : a service meant to be used to recognize if the message was modifiedby an unauthorized user. This is very important in the context of Internet since theconnection between sender and receiver usually goes through many intermediatepoints which could be malicious to perform data manipulation before delivering themessage to the receiver.

Authentication : a service on the identification of entities and the information itself.In communication, it is important that both sides are able to identify each other. It canbe provided as entity authentication, which concerns about identifying the source andendpoint of the communication or data origin authentication, that basically checksthe data integrity and assumes that if there was any unauthorized modification in thecontent, the data origin may have changed.

Non-repudiation : a service that assures that the validity of some statements regard-ing the data is not manipulated by any entity. This usually requires the presence ofa trusted third party to be able to resolve some types of conflict. In this context, itmeans for instance that no entity can deny any previous actions or commitment thatthey claimed to have done before.

1.2. VIDEO ENCRYPTION 16

1.2 Video encryption

In the video applications context, due to the existing public networks and the widelyuse of the Internet, information security has become an important issue (LIU; KOENIG, 2005),specially in relation to the secrecy of the users information. Therefore the use of cryptographictechniques became an important necessity in this area.

However these applications have some specific requirements that are not consideredby regular encryption systems. They demanded the adaptation of conventional cryptographicschemes. Among these requirements, some are highlighted (SOCEK et al., 2007):

Codec standard and video format compliance: the encrypted data must preserve thevideo compression format in such a way that standard decoders must be able to decodeit without any errors, even without decryption, but perceiving an unrecognizablevideo. In other words, the encryption system must require no changes in the encodingand decoding modules and they should be used as a black-box.

Perception quality control: the encrypted video should not contain any informationrelated to the original video. Usually the encryption algorithm has a mechanism tocontrol how much the perception quality is degraded in the encrypted data. In someapplications of partial encryption, the perception quality is only partially degraded,so the video is still perceivable but some specific details cannot be recovered.

Processing speed: most of the video applications are real time, like video conferenceand video streaming, so they need an encryption algorithm that is fast enough to beapplied in these conditions. Also, it is important to notice that videos are usuallylarge data, which makes this task even more complicated.

Video compression: it is very desirable that the encrypted video has a similarbitstream size compared to the original video, when both are using the same codingmodules and under the same encoding settings.

The video encryption research is basically split into two main approaches. Selective

Encryption applies conventional encryption techniques in specific small parts of the videobitstream. The principle is that these encrypted parts are crucial for the video perception andsmall enough to be able to apply a conventional encryption algorithm such as AES (AdvancedEncryption Standard) or DES (Data Encryption Standard) (SINGH; SUPRIYA, 2013). Spanosand Maples, for instance, proposed an algorithm that encrypts only the I frames of every MPEGgroup of picture, since the decoding of P and B frames depends on the decrypted associatedI-frames, they cannot be recovered as well from an unauthorized user (MAPLES; SPANOS,1995).

The second methodology is Full Encryption which applies on the whole bitstream anentropy coding that also provides encryption (LI; CHEN; ZHENG, 2004). The principle here is

1.3. OBJECTIVE OF THE RESEARCH 17

to design an efficient technique that encrypts and encodes at the same time. This methodologyis more promising, since it usually does not require any modifications in the codec modules.However it is a big challenge to be able to design an efficient algorithm to process large bitstreamsand still provide a safe encryption with a good performance compression. Hence the techniquesusually in this method have been proved insecure against some types of cryptographic attacks ormany of them are meant for limited security scenarios.

Some encryption algorithms are codec specific (SHI; BHARGAVA, 1998a,b), whileothers do not depend on the video codec used by the application. The codec specific ones have abigger potential for optimization to obtain a better compressibility or video quality performance,but they have more limited usage and restrict the possibility of using different codecs for theapplication. Most of the codec independent algorithms apply transformations in each videoframe that will be encoded and decoded by the codec module, by using it as black-box.

1.3 Objective of the research

The focus of this work is to propose a secure codec independent video encryptionalgorithm for a generic video application aiming also to obtain a solid video compression andquality performance with an acceptable video processing speed.

In order to be codec independent and optimize for compression performance, sometechniques apply the encryption before the coding stage in such a way that the encrypted videois highly compressible. In this context, the technique proposed by Socek explores the duality ofpermutation in data encryption and compression (SOCEK et al., 2007). This technique is codecindependent, with only a few codec restrictions, and explore the compressibility potential for ageneric spatial codec.

This cryptographic scheme explores the spatial correlation within each video frame anduses permutations to enhance it in the encrypted data assuming there is a trivial and naturaltemporal correlation between consecutive frames. The compression performance of the scheme isvery sensitive to the amount of motion in the scene and exploring non-trivial temporal correlationwas not in the scope of the work.

However, many applications deal with high motion videos and most codecs nowadaysalso explore non-trivial temporal correlation in order to obtain a better compression rate. Theoriginal method is studied and extended in this dissertation to include more complex videocoding concepts highly used by modern codecs in order to improve its performance.

This research can be considered an extended work of the original method. The crypto-graphic scheme proposed by Socek focused on improving the spatial correlation of every videoframe with permutations. This work expands the research in exploring also non-trivial temporalcorrelation making it more suitable for high motion sequences.

1.4. ORGANIZATION 18

1.4 Organization

This work is organized in seven chapters. The first one introduces the context, motivationand highlights the objective of the research. In the second chapter, basic digital video encryptionand compression concepts are explained. The third describes and analyzes the original methodstudied and extended by this work. In the fourth chapter, the temporal correlation extensionproposed by this work is detailed. In the fifth chapter, the cryptographic system is analyzed interms of security and performance. In the sixth chapter, experiments are conducted to evaluatethe performance of the proposed methods in this dissertation. For last, in the seventh chapter,final considerations about this dissertation as well as possible future works are shown.

191919

2Digital video processing

This chapter introduces some basic concepts related to digital video processing, focusingin the coding and encryption areas. Also it defines the notations that are used in this document.

2.1 Definitions

A digital video can be denoted as a sequence of frames. Each frame is a digital imagewith specific width and height. As it is defined in (GONZALEZ; WOODS, 2006), a digital imageis a two dimensional function f (x,y), where the variables x and y are spatial coordinates in theimage and f (x,y) is the intensity or grayscale level in the point identified by x and y coordinates.Since it is a digital image, x, y and f (x,y) are discrete values. The term pixel is used to representthe elements of the digital image and is equivalent to a point in the space defined by the imagefunction.

The definition used before is related to grayscale images, however it can be easilyextended to include colored images concepts, only having to consider f (x,y) as a tuple thatrepresents the color itself, i.e., usually the coordinates in the color space used. In the RGB colorspace, for instance, each value of f (x,y) would be a tuple (R,G,B).

The notation used in this work, especially in the description of algorithms and procedures,is the following (also shown in Figure 2.1):

The total number of frames in a video is represented by the variable N.

The variables W and H represent, respectively, the width and height of each videoframe. Those two parameters are fixed in every frame of the video.

The n-th frame is denoted by Fn. The frames are indexed from 1 to N.

A pixel in the coordinates (x,y) of a frame Fi has intensity denoted by Fi(x,y). Thecoordinate x is defined in the interval [0,W −1], while y in [0,H−1]. The point (0,0)denotes the upper leftmost point in the image.

2.2. VIDEO CODING 20

(a) frame sequence (b) frame parameters

Figure 2.1: Video notation

The pixel intensity in the image varies from 0 to Imax, according to the intensityresolution of the image. In practice, in an n-bit resolution image, Imax = 2n−1.

The histogram of an image is the distribution of the pixel intensities in it. Mathematically,the histogram h(x) of an image can be defined as a function of the grayscale level x, where h(x)

represents the number of pixels with intensity x in the image. A visual example of the histogramfunction plot is seen in Figure 2.2.

Figure 2.2: Example of image histogram plot

2.2 Video coding

Codec is a processing module responsible for encoding and decoding signal. This termis used in video, image and audio processing fields. There are two main types of codecs:

2.2. VIDEO CODING 21

Lossless: assures the encoded signal can be decoded with content identical to theoriginal signal. In general, this implies a much smaller compression rate, since nochange can be made in the original signal to make it more compressible and it canonly explore statistical redundancy in it.

Lossy: allows that the encoding process loses some information in the original signal.It is based in the principle that some information in the signal (minor details) can bediscarded without affecting significantly how the signal is perceived. For instance, inthe audio area, some frequencies are barely perceptible by human audition and hence,if removed, they do not degrade too much the quality of perception. Most codecs ofthis type have a quality control mechanism so that the user can increase or reduce theamount of information the codec may discard to make the signal more compressible.

2.2.1 Data redundancy

With respect to data encoding, it is important to be able to represent the desired infor-mation with as few information units as possible. Some types of information require less bitsthan others to be encoded under the same coding system. For example, a video with all framesidentical to one another is much easily coded with fewer units of information than a video withvery distinct frames.

In order to formalize the concept of information “complexity", it is needed to use ametric that is able to quantify the amount of information in a signal. With this purpose, Shannonintroduced the concept of entropy (SHANNON, 1948) in the information research area, althoughit was originally proposed for communication theory (IHARA, 1993).

Entropy is associated to the level of uncertainty or unpredictability of the information.In other words, it is related to the level of lack of redundancy in the information. In a video,there are four types of data redundancy or data correlation that can be explored in the encoding(ESAKKIRAJAN; VEERAKUMAR; NAVANEETHAN, 2009; GONZALEZ; WOODS, 2006):

Coding Redundancy : refers to the average length of the words used to encode thesymbols that occur in the data to be encoded. It is applicable to many types of data,such as image, audio and text. For example, if a grayscale image, with only black andwhite pixels, is coded in intensity resolution of 8 bits, it would be highly redundant,since it would be possible to use a single bit to identify the black and white pixels allover the image.

Entropy coders in general are capable of eliminating a lot of coding redundancy. Theclassic Huffman coding (HUFFMAN, 1952) defines a codewords table prioritizingthe smallest codewords to the ones that are most frequent in the data.

Spatial Correlation : applies to each video frame separately (intra-frame) and comesfrom the image compression area. Also known as interpixel redundancy, the idea

2.2. VIDEO CODING 22

is that in most images pixels spatially close to one another have high probability ofhaving similar intensity, especially in the interior of regions. This tends to be truein the whole image, except for the edge areas. It also comes from the human visionperception: when an image with very low spatial correlation is seen, the shapes anddetails of it are barely recognizable by the human eye in most cases. This principle isalso explored by encryption in order to degrade explicitly the quality of perception ofan image, as shown in Figure 2.3.

(a) original Lena (b) random permutation of Lena image

Figure 2.3: Spatial correlation illustration

The concept of spatial correlation can also be easily explored by image compressionalgorithms. A simple example is the Run Length algorithm which encodes a sequenceof pixels with same intensity with the pair of parameters: intensity value and numberof pixels with such intensity.

Temporal Correlation : this concept involves more than a single frame (inter-frame).It is based in the video perception quality principle that consecutive frames tend tobe very similar, differing among themselves by small object movements, unless it isa change of scene. This principle is justified by the fact that if this is not true, thevideo would cause a strange sensation of discontinuous movement of the elements ofthe scene which would greatly degrade the video’s perception quality.

Hence, the transition of two consecutive frames in the same scene usually consistsof small object movements (motion), or camera translation, zoom or minor intensityvariation. Figure 2.4 shows, for example, two consecutive frames and the residualdifference between them (the difference is centered in the medium grayscale level).

Psychovisual redundancy : consists in the fact that the human vision does not

2.2. VIDEO CODING 23

(a) previous frame (b) current frame

(c) residual difference of (a) and (b)

Figure 2.4: Temporal correlation

respond equally to all frequencies in a signal (or intensities, colors of pixels), sosome of them can be discarded barely degrading the signal perception. This type ofredundancy requires more specific studying of the human vision and it can be moresubjective and complicated to estimate properly.

2.2.2 Video frame types

With respect to exploring data redundancy in video, there are spatial-only codecs thatexplore only the data redundancy that occurs spatially in each frame separately. However, mostcodecs also explore temporal correlation. Hence the frames coding process can occur in threemain ways, which will define the three main frame types (MAYER-PATEL; LE; CARLE, 2002):

I-frame: its compression is intra-frame, which means it is self encoded and decoded.However its potential of compression is more limited since the redundancy exploredis only in itself. Spatial-only codecs only have frames of this type.

2.2. VIDEO CODING 24

P-frame: also known as forward predicted picture. It is compressed based on smallchanges to an earlier coded picture. Obviously it is not self-decodable, requiring thereference frame to be decoded first.

B-frame: also known as bidirectionally predicted picture. It is compressed based onpredictions or interpolations of earlier and/or later picture. Its potential of compres-sion is the best among the three types but its decoding is dependent on the referenceframes. In a long chain of B and P frames, the decoding speed of a given frame canbe degraded significantly.

2.2.3 Encoding and decoding module

In order to have a consistent way of decoding a video efficiently, when a P or B frame isbeing decoded, its reference frames should be previously decoded in the bitstream. The codecsusually read the bitstream linearly and store the decoded frames in the frame buffer module inmemory. As the next frames are being read and decoded from the bitstream, their decoding, incase of a P or B frame type, can only depend on the frames that are already stored in the framebuffer. This implies that the coded frames order is not necessarily the same as the original videoorder, as it can be seen in Figure 2.5.

(a) original frame order

(b) coded frame order in bitstream

Figure 2.5: Video coding: frame types

To reduce the complexity of decoding a video, the amount of space needed in memoryfor it and also to allow the decoding process to “jump” to a given frame, the concept of group

2.2. VIDEO CODING 25

of pictures (GOP) was created. A group of pictures is a self decodable unit of frames, it canbe interpreted as if the large video was divided into a set of multiple independent small videos.Naturally these units have to start with an I-frame. So whenever a given frame (which is not anI-frame) needs to be decoded from scratch, the decoding module must find which GOP that framebelongs to and start the decoding of that GOP until the desired frame is found in the bitstream.In an open GOP, it is possible that a few frames in the current GOP reference frames from aprevious one, but such cross-GOP references are usually avoided not to degrade the decodingspeed.

In B or P frames, the frame prediction mentioned is basically frame “alignment” (sincethere may have been some object movements or camera translation between such frames) and itscompressibility potential comes from the coding of the frame residual difference of the currentone and its prediction. The goal of frame alignment is to minimize the differences (prediction

error) between the current and reference frames when calculating difference of pixel intensitiesin same coordinates.

To perform frame alignment, motion estimation parameters techniques must be computedand then these parameters are compensated using motion compensation techniques in the desiredframe. The motion estimation parameters are usually computed in the encoding stage andencoded in the bitstream so that the decoder module is able to reconstruct the frame properly.

Codecs, in general, operate in a block basis, applying the encoding process of blocksseparately. Some of them allow some blocks of a frame to be intra-coded (like an I-frame) andsome others inter-frame coded, like a P or B frame. In this sense, the concept of reference framesis extrapolated to reference blocks, and addresses a block in one of the reference frames storedin the frame buffer.

Hence the encoding process for each block is usually the following:

1. Decide if the block should be intra-coded or inter-coded. This decision is usuallybased in the existence of a similar block in one of the reference frames. The similarblock concept is usually the computation of a block difference metric and is detailedlater in Chapter 4. In case it is intra-coded, encode the block itself with the processdefined in step 3.

2. Compute motion vector parameters from this block to find a good matching block.The methods for finding such blocks efficiently in the reference frame is detailed laterin Chapter 4. The motion vectors are basically vertical and horizontal translations ofthe position of the current block to the reference one. Encode these motion vectors inthe bitstream using an entropy coder.

3. Compute the residual block which is the aligned difference of the current block andthe predicted one. Encode the residual block like a spatial encoder, using an algorithmthat explores spatial correlation. The most common approach for such encoding is

2.3. PERMUTATIONS IN DIGITAL VIDEOS 26

Figure 2.6: Video coding scheme

based on the discrete cosine transform (DCT), which outputs a matrix where themost significant coefficients (higher frequencies) are located in a given portion of it.After that, a zig-zag scan is performed on this matrix to generate an output sequencein such a way that the higher frequency values are close to one another and the valuesgradually decrease until eventually zero frequency values. The output sequence isthen processed by a Run Length encoding to remove redundancy of high frequencyvalues in the sequence and its output is finally given to an entropy coder.

The basic coding module scheme can be seen in Figure 2.6. The temporal module isresponsible for storing the reference frames, performing motion estimation and computing theresidual image after motion compensation. The spatial module explores the spatial correlation inthe residual and outputs it to the entropy coder. The entropy coder also takes the motion vectorsfrom the temporal module and outputs both into the bitstream.

2.3 Permutations in digital videos

A sequence s of length N is an ordered collection of N elements allowing repetition.Formally, we define:

s =[

x0 x1 x2 ... xN−1

]We denote s[i] by the i-th element of the sequence s. A permutation of a sequence s is

a bijection from s to itself (SOCEK et al., 2007), i.e., the mapping of an element to anotherelement of the same sequence, without repetition of the mapped element and every element beingmapped to another one. Formally, a permutation P of a sequence s is defined as a matrix:

P =[

i0 i1 i2 ... iN−1

]where:

N is the permutation length which matches the sequence length and

2.3. PERMUTATIONS IN DIGITAL VIDEOS 27

i j, where 0≤ j < N, is the index into which the j-th element is mapped.

the notation P[ j] is equivalent to i j.

Since P is a bijection, the permutation matrix has the following properties:

∀x,y ∈ 0, ...,N−1,x 6= y→ ix 6= iy and

∀x ∈ 0, ...,N−1,∃y ∈ 0, ...,N−1|iy = x.

The permutation matrix is defined unidimensionally, but it can be applied into a multidi-mensional entity (such as a video frame), by just defining an unidimensional way of traversethrough the multidimensional entity. In an image or video frame, this could be done by traversingthe image row by row. The notation P(s) used in this work represents the application of thepermutation P in the sequence s, which can be a frame for instance. A permutation applicationof a sequence s is defined as a sequence:

P(s) =[

s[P[0]] s[P[1]] s[P[2]] ... s[P[N−1]]]

Given a permutation P, the notation P−1 is used to define its inverse permutation, i.e.,the unique permutation P′ such that P(P′(s)) = s.

Permutations have been widely used in encryption techniques for a long time. Permuta-tion based transformations are the fundamental basis of modern symmetric key cryptography,such as in AES or DES encryption systems (SINGH; SUPRIYA, 2013).

Permutations have also been used as compression techniques primitives. The Burrows-

Wheeler Transform (BWT) (BURROWS et al., 1994) is an example of permutation-basedtransform. It operates modifying blocks of text, generating a block with same characters but in adifferent order, with improved spatial correlation.

In the context of encryption, permutations can be applied in different ways. It can beapplied into each video frame separately or it can be applied into the bitstream itself. Thesepermutations are usually generated by a secret key which must be shared between both sides ofcommunication. Such secret permutation can then be applied into pixels or blocks of the framesto completely degrade the video’s perception quality but making it still decodable bitstream (pure

permutation video encryption). Another way is to perform permutations into important parts ofthe bitstream making the permutated bitstream impossible to be decoded properly, unless it isdecrypted.

Among the existing approaches, one of them consists in scrambling randomly thecoefficients of the DCT in an MPEG video frame based on a secret permutation (TANG, 1996).Another known strategy is to apply the permutation on the codewords table used by the Huffmancoding algorithm in the entropy coder module (BHARGAVA; SHI; WANG, 2004). Despite beingoptimized, both of them are highly invasive and specific to the codec.

2.3. PERMUTATIONS IN DIGITAL VIDEOS 28

Applying a permutation in pixels of an image affects the spatial correlation of theirneighborhood. This fact is used by encryption techniques to destroy the spatial correlation,making it unrecognizable to non-authorized users. However, this drastically degrades thecompression rate of the image. In the other hand, if a sorting permutation is used, for instance,bringing close pixels with similar intensities that were distant to each other in the image, thepermuted frame becomes highly compressible. This would require the transmission of suchpermutation so that the frame could be reconstructed properly, which can be expensive.

According to (SOCEK et al., 2007), sorted and ‘almost-sorted’ frames, are stronglyspatially correlated and, hence, can be even more compressible than the original frame, whenusing a spatial-only codec, as it is shown in Figure 2.7. The ‘almost-sorted’ concept mentionedby the authors refers to the application of the previous frame’s sorting permutation into thecurrent frame. For this purpose, it is assumed that consecutive frames are very similar, such thatthis application results in an almost-sorted image with a few unsorted pixels. This result lookslike a gradient-like image with some noise in it.

(a) original image PNG 176x144 22.3KB (b) ‘almost-sorted’ image PNG 176x144 12.5KB

Figure 2.7: Permutation and compressibility

292929

3Permutation-based video encryption

This chapter presents a detailed description of the video encryption method proposedby Socek et al. (SOCEK et al., 2007). This permutation-based video encryption method wasdesigned to create an efficient and highly compressible solution for video encryption suitable forreal time applications.

3.1 Cryptographic scheme

The cryptographic scheme assumes the existence of two communication channels. Thefirst one, ChS, is a secure channel, where the data transmitted are encrypted with a safe commu-nication protocol, which is generally a conventional encryption technique. The second channelChR allows free data transfer, without any protection or security protocol executed on both ends.

Both channels can be eavesdropped, but the secure channel requires that the attackerbreaks the security of the encrypted data transmitted through such channel, which is assumed tobe extremely computationally expensive and unfeasible. However any data transferred throughthe secure channel ends up in a much bigger bitstream size, because of its encryption, sothis channel should be avoided as much as possible since it would affect drastically the videotransmission rate.

This cryptographic scheme operates with a spatial-only codec module as a black-box,without depending on how it works or how the video is represented in the bitstream. Twofunctions are assumed to be available for the codec: encode and decode. The following notationis used for the codec module functions used in the system:

The expression E(F) denotes the output frame of the encoding of frame F to bewritten in the bitstream.

The expression D(F) denotes the decoded version of the frame F read from thebitstream.

Note that in case of a lossless codec, D(E(F)) = F , but this may or may not holdtrue for a lossy codec.

3.1. CRYPTOGRAPHIC SCHEME 30

Figure 3.1: Overview of cryptographic scheme

The encryption stage in this cryptographic scheme occurs before the encoding stage, insuch a way that the encrypted data usually preserves or improves the spatial correlation of thepixels in each frame. An overview of the cryptographic system’s architecture is shown in Figure3.1.

The scheme is a symmetric key cryptography type and the protocol includes a setupfor the transmission of the first key, which is the unique sorting permutation of the first frame,through the secure channel. The ownership of the key determines the authenticity of a user. Thenext frames are transmitted through ChR and are encrypted with the current key, which is theprevious frame’s sorting permutation. At each step, the current key is updated to the currentframe’s sorting permutation. Hence, the second frame is encrypted with the first frame’s sortingpermutation, the third one with the second frame’s sorting permutation, and so on.

3.1.1 Unique sorting permutation

The scheme’s encryption process is based on generating a key which is the unique sortingpermutation of a frame. The unique permutation restriction comes from the fact that a framecan have multiple sorting permutations and since this permutation is the cryptographic key, bothsides must be able to obtain the exact same sorting permutation so that the frame reconstructionworks correctly.

The sorting permutation is computed in respect to the pixel intensities in the frame. Thereare many ways this permutation can be computed. The authors indicated a Quicksort variationmethod, shown in Algorithm 3.1, in which the permutation is computed as a copy of the frame issorted.

This procedure takes a copy of frame F , which is sorted during the process, a permutation

3.1. CRYPTOGRAPHIC SCHEME 31

which is adjusted, and two integer parameters which are the beginning and ending indices ofarray to be sorted. Note that both F and P are seen as unidimensional entities of size W ×H.

Algorithm 3.1: Unique sorting permutation method based on Quicksort1 procedure unique_sorting_permutation (F,P, le f t,right):

Input :Frame F , unique sorting permutation P, integers le f t and right

2 Initialize i← le f t−1, j← right, v← F [le f t]

3 if right ≤ le f t then4 return5 end

6 repeat7 i← i+18 while F [i]< v do9 i← i+1

10 end

11 j← j−112 while j > le f t and F [ j]> v do13 j← j−114 end

15 if i < j then16 swap F [i] and F [ j]17 swap P[i] and P[ j]18 end19 until i≥ j

20 unique_sorting_permutation(F,P, le f t, i−1)21 unique_sorting_permutation(F,P, i+1,right)22 end

A sorted frame (gradient-like image) is obtained when applying the sorting permutationinto the frame itself. However, when this permutation is applied in the next frame, it tends togenerate an ‘almost-sorted’ frame, since the consecutive frames are very similar usually. Theauthors suggest that sorted and ‘almost-sorted’ frames are strongly spatial correlated, hence theycan be highly compressible by a spatial encoder.

3.1.2 Encryption and decryption algorithms

In algorithm 3.2, the pseudocode for the encryption method is shown. It is appliable toboth lossless and lossy codecs. Notice that, in case of lossy codecs, the encrypter side needs tocompute the unique sorting permutation on the frames that will be decrypted by the receiver tomake sure they both obtain the same key for the next frames. In case it is a lossless codec, thesorting permutation can be computed directly on the input frames.

3.2. ADVANCED EXTENSIONS 32

In this process, the first frame is sent through ChS, as part of the protocol setup, andits decoded version is set as the first key. At each step, the next frame is encrypted by suchpermutation key and sent to the encoder module before being transmitted through ChR and thekey is updated to the sorting permutation of the currently decrypted frame.

Algorithm 3.2: Video encryption algorithmInput :Stream of video frames F1, ...,FN

1 begin2 F transmitted

1 ← E(F1)

3 Fdecrypted1 ← D(F transmitted

1 )

4 P1← unique_sorting_permutation(Fdecrypted1 )

5 Send F transmitted1 through ChS

6 for each Fi, where i = 2, ...,N do7 F transmitted

i ← E(Pi−1(Fi))

8 Fdecryptedi ← P−1

i−1(D(F transmittedi ))

9 Send F transmittedi through ChR

10 if i < N then11 Pi← unique_sorting_permutation(Fdecrypted

i )12 end13 end14 end

The decryption process is simpler as described in Algorithm 3.3. This algorithm isalso valid for both lossless and lossy codecs. Notice that the input of this procedure is theframes transmitted in the encryption process described earlier. Since the notation used in bothalgorithms is precisely the same, it can be verified that the expression found for the uniquesorting permutation (which is the cryptographic key of the system) is exactly the same for bothsides, which assures the correctness of the method.

3.2 Advanced extensions

In order to make the scheme more adapted to specific requirements or to improve certainaspects of the cryptographic scheme, the authors proposed a set of extensions, which are detailedbelow.

3.2.1 Perception quality control

One of the requirements of many video applications is to be able to control how muchthe perception quality is degraded in the encrypted video and also to allow partial encryption.On this perspective, Socek et al. designed a block-based extension.

3.2. ADVANCED EXTENSIONS 33

Algorithm 3.3: Video decryption algorithmInput :Stream of video frames F transmitted

1 ,F transmitted2 ...,F transmitted

N1 begin2 Receive F transmitted

1 through ChS3 Fdecrypted

1 ← D(F transmitted1 )

4 P1← unique_sorting_permutation(Fdecrypted1 )

5 for each F transmittedi , where i = 2, ...,N do

6 Receive F transmitted1 through ChR

7 Fdecryptedi ← P−1

i−1(D(F transmittedi ))

8 if i < N then9 Pi← unique_sorting_permutation(Fdecrypted

i )10 end11 end12 end

In this extension, basically instead of computing the sorting permutation for the wholeframe, it is computed for each block of fixed size separately. The quality perception control inthis method is determined by the block size parameter used.

As the block size increases, the quality of perception decreases. In extreme cases, whenblock size is 1×1, each block sorting permutation is the identity and hence no actual encryptionis performed, while when the block size is the same as the frame’s dimensions, the encryption istotal and it performs just like the original method.

Notice that when the sorting permutation is reduced to the block space instead of thewhole frame, each pixel is potentially permuted to a smaller region, which means that this alsoreduces the visual impact of unsorted pixels in the almost-sorted permutation coming from themotion between consecutive frames. This can be verified in the high motion sequence example inFigure 3.2, where the almost-sorted permutation errors are much more noticeable in higher blocksizes. Hence, this parameter also affects the compressibility of the method, since it potentiallyaffects the spatial correlation of the encrypted frames sent to the encoder modules.

3.2.2 Handling constant camera translation

The original algorithm is highly sensitive to global camera translation, since the consec-utive frames do not match properly in this situation and the almost-sorting permutation errorsare bigger in such frames. In order to minimize this issue, the authors proposed a method ofadjusting the sorting permutation with global camera translation parameters.

The idea of this extension is that the sender, before encrypting the video, detects the globalcamera translation, adjusts the sorting permutation and transmits those translation parameters(tx, ty) to the receiver, which will also adjust the sorting permutation once received to ensure bothsides obtain same key.

3.2. ADVANCED EXTENSIONS 34

(a) original frame (b) 8 x 8 block-based method

(c) 16 x 16 block-based method

Figure 3.2: Block-based approach

The Algorithm 3.4 shows the proposed implementation for adjusting the permutation.In a simplified description, the pseudocode adjusts the permutation of a given pixel to the oneshifted by the translation parameters. Since some of these pixels may fall outside the boundariesof the image, they are shifted to the bottom of the image.

3.2.3 Histogram hiding method

One of the biggest security issues of the original method is that the encrypted framereveals completely the histogram of the original image. Some applications can not tolerate suchinformation being revealed. Histogram can hint some aspects about the image, and expose whattype of image is being encrypted, since some image types (darker, brighter, cartoon) usually havespecific histogram patterns, not to mention it can make it easier for an attacker to reconstruct theoriginal image.

The authors proposed the histogram hiding method which consists basically in applyingthe unique sorting permutation on the consecutive frame difference, instead of the plain current

3.2. ADVANCED EXTENSIONS 35

Algorithm 3.4: Constant camera translation adjustment for sorting permutation1 function adjust_sorting_permutation (P, tx, ty):

Input :Sorting permutation P and translation parameters tx and tyOutput :Adjusted sorting permutation P′

2 Initialize begin← 0 and end←W ×H3 Initialize P′ as a copy of P

4 for each 0≤ k <W ×H do5 i← tx +P[k] mod W6 j← ty + bP[k]/Wc7 if 0≤ j < H and 0≤ i <W then8 P′[begin] = j×W + i9 begin← begin+1

10 else11 end← end−112 P′[end] = ( j mod H)×W +(i mod W )

13 end14 end

15 return P′

16 end

frame. This way, unless the attacker has access to a previously decrypted frame, it can not obtainany direct histogram information from the encrypted frame.

To allow the proper frame reconstruction, the difference between two frames F and G

are formalized as in Equations 3.1 and 3.2.

∆(F,G)[x,y] = clip(

F [x,y]−G[x,y]+⌊

Imax

2

⌋) 3.1

clip(x) =

Imax, if x > Imax

x, if 0≤ x≤ Imax

0, if x < 0

3.2

The point of this difference function is to centralize the pixel intensity difference in⌊Imax

2

⌋, in such a way that intensities above this level represent a positive pixel intensity

difference of F and G. The closer it is to a zero frame difference, the more the resulting imagelooks like plain average gray. This clipping function is only feasible for the lossy scenario

though, because pixel differences aboveImax

2level are clipped to

Imax

2, which can greatly impact

the video quality.Also, this extension affects deeply the video compressibility. This is justified by the

fact that a trivial way to explore temporal correlation and convert it into spatial correlation invideos is to encode the frame difference, as the process described in previous chapter. This can

3.2. ADVANCED EXTENSIONS 36

be noticed by the homogeneity of the encrypted frames shown in Figure 3.3 when using thisextension.

(a) original encrypted algorithm (b) histogram hiding method

(c) original frame

Figure 3.3: Histogram hiding extension

3.2.4 Dealing with frame losses

The original scheme was built under the assumption that no frame is lost in the transmis-sion, since the decryption of a given frame is entirely dependent on its previous frame. However,this scenario is unrealistic for online video applications. Therefore, it needs an adaption tobe able to deal with such issue. Also this scheme should be adapted to allow a more efficientdecryption of a given frame without needing to decrypt all previous frames before.

Socek et al. propose to encrypt each group of pictures separately, which means transmit-ting the initial frame of every GOP through ChS, like the protocol setup. Since the GOP size isusually small enough, the decryption should allow efficient random frame access process and, incase a frame is lost in transmission, the decryption will not work only for the incoming framesof the current GOP and as soon as a new GOP is started, the decryption process will restore

3.2. ADVANCED EXTENSIONS 37

normally. A more convenient protocol can be implemented under this perspective that could, forinstance, detect the loss of frames and reset a new GOP instantly.

383838

4Extended encryption

The scheme proposed by Socek et al. described in previous chapter was intended to beused by a spatial-only codec. It was out of scope of the original work to conduct a deeper studyon the impact of motion in the algorithm and the technique was designed especially for lowmotion video sequences. In this chapter, the impact of motion in the scheme’s performance isinvestigated and an extension is designed to enhance the scheme’s performance for high motionvideo sequences.

4.1 Unique sorting permutation

One of the main aspects of the algorithm is the computation of the unique sortingpermutation. A single frame can have multiple sorting permutations and different sortingpermutations impact on how the next frame will be almost-sorted.

A sorting algorithm is classified as stable if it guarantees the order of same valueelements in the sequence is preserved in the ordered vector (HORVATH, 1978). Algorithmslike Quicksort are usually unstable, just like the procedure described earlier. Among differentsorting permutations, a stable sorting permutation is preferred because of the spatial correlationprinciple, since keeping the order of same value elements is likely to reduce the average distancebetween the pixel original position and its permuted one.

Counting Sort is a stable sorting permutation, with linear performance on W ×H andImax (CORMEN et al., 2001). The proposed procedure described in Algorithm 4.1 computes thepermutation based on the histogram of the frame. Given the cumulative histogram, the offsetvector is derived to define the position to which a pixel of given intensity should be shifted in thepermutation. This offset is incremented each time a pixel of such intensity is found.

By examining the pseudocode, it is easy to notice that the order pixels with given intensityis preserved from the original frame. The algorithm performs faster than the quicksort methoddescribed in previous chapter, and it also requires less space overhead in most cases, since itdoes not need a copy of the frame, but only a cumulative histogram computation of the frame.

4.2. MOTION SENSITIVITY 39

Algorithm 4.1: Unique sorting permutation method based on Counting Sort1 function unique_sorting_permutation (F):

Input :Frame FOutput :Unique sorting permutation P

2 Initialize histogram H1 to a vector of zeros for each pixel intensity3 Initialize cumulative histogram H2 to a vector of size Imax +14 Initialize permutation P to a vector of size W ×H

5 for each 0≤ x <W do6 for each 0≤ y < H do7 H1[F [x][y]]← H1[F [x][y]]+18 end9 end

10 H2[0]← 0

11 for each 0≤ x≤ Imax do12 H2[x+1]← H1[x]+H2[x]13 end

14 for each 0≤ x <W do15 for each 0≤ y < H do16 P[x][y]← H2[F [x][y]]17 H2[F [x][y]]← H2[F [x][y]]+118 end19 end

20 return P21 end

4.2 Motion sensitivity

The spatial correlation of the encrypted frame is directly dependent on the almost-sortingpermutation quality. The quality of the almost-sorting permutation is associated to the framedifferences, which are essentially histogram difference and motion (unaligned objects betweenconsecutive frames, because of object movements, zoom or camera translation). Therefore, thecompressibility and video quality performance of the cryptographic scheme depends on theamount of motion in the video.

One of the extensions proposed by the authors considers a global camera translation,however the motion in video usually consists of some local motion from object movementsand global motion, such as camera translation, zooming, rotation. Also, the extension designedby the authors require the transmission of the translation parameters aside from the videobitstream, since that information can not be encoded into the bitstream in a codec independentway. An additional security issue is that information about the scene is being exposed with thesetranslation parameters which would require concern about encryption and encoding of such

4.3. MOTION ESTIMATION 40

parameters.

4.3 Motion estimation

Interframe prediction is very useful in video coding to explore the large amount oftemporal and spatial correlation existing in video sequences. Most video codecs essentiallyencode the differences between the current and predicted frames, which are based on the previousframes. As the prediction becomes more accurate, the smaller is the prediction error to be encodedand the higher the compression rate will be.

In still scenes, the previous frame is usually a very good prediction for the next frame.However, when there is significant amount of motion, a better prediction would be a frame wherethe elements that moved between both frames are aligned. This concept of adjusting or aligningdisplaced objects in frames is referred to motion compensation. This process usually involves thedetection of the motion parameters, which is known as motion estimation, for later compensation.

Motion estimation techniques are basically classified in two main methods (ARMITANO;FLORENCIO; SCHAFER, 1996):

forward motion estimation (FME) : bases the motion estimation on both the currentframe and a previously transmitted frame. Since the current frame is not known bythe receiver side, the motion parameters need to be transmitted.

backward motion estimation (BME) : bases motion estimation only on frames pre-viously transmitted.

Motion estimation methods fall under two main categories (IRANI; ANANDAN, 2000;TORR; ZISSERMAN, 2000):

Direct methods : is meant to compute motion parameters in two aspects at once: thecamera motion and the correspondence of every pixel in the frames.

Indirect methods : works on a feature basis. Instead of computing parameters forevery pixel in the image, it focus on key features which are simpler to track in theframes. On the other hand, it needs additional concern on getting good features toextract motion parameters and usually involves a bigger computational cost.

With respect to the motion parameters semantics, the techniques usually can be classifiedinto one of the four main types:

Global motion estimation : the parameters indicate usually camera related motion,such as translation, rotation and zoom. Hence, these parameters affect every pixel inthe image. It is more suited for video sequences with essentially camera motion.

4.4. BLOCK-BASED MOTION ESTIMATION ALGORITHMS 41

Region-based motion estimation : segments the frame into a set of regions and forevery region, computes motion parameters, by attempting to find a correspondencein the reference image. Usually involves the process of image segmentation, whichcan make it very computationally expensive.

Block-based motion estimation : splits the frame into blocks of fixed or variablesize and for each of them, indicates a matching block in the reference frame, hencecomputing motion parameters for every block in the image. The computational costby these approaches is usually very low and it is also very parallelizable. Most videocodecs rely on this type of motion estimation to perform interframe prediction coding.

Pixel-based motion estimation : is essentially a pixel correspondence problem andcomputes motion parameters for every pixel in the image. The motion parametersin this approach are usually large and can be very redundant among pixels in sameregion with similar motion.

Considering the level of granularity of the motion parameters, the general purposetype, low computational cost and the intensive research on the area, block-based methods werepreferred in the development of the extension.

4.4 Block-based motion estimation algorithms

The block-based techniques are essentially based in the block matching problem. Thisproblem can be defined as to locate for each macroblock in the current frame the best matchingblock in a reference frame. The blocks can be defined by dividing the image frame into non-overlapping rectangular regions of a given size W ×H. The motion parameters defined by thesealgorithms are referred as motion vectors, which model the horizontal and vertical movement orposition displacement between the matched blocks.

4.4.1 Minimization of dissimilarity function

The block matching problem can be treated as a minimization problem. The objectivefunction to be minimized is usually related to the dissimilarity of the blocks. So an optimalsolution is to find for each block in the image, the block in the reference image with minimumdissimilarity value.

There are many dissimilarity functions used in the literature and it is one of the distinctaspects among the different block matching algorithms. The most commonly used functions areSum of Absolute Differences (SAD), Sum of Squared Differences or Sum of Squared Errors (SSDor SSE), Mean Absolute Difference (MAD) and Mean Squared Difference or Mean Squared

Error (MSD or MSE), which are defined in Equations 4.1, 4.2, 4.3 and 4.4.

4.4. BLOCK-BASED MOTION ESTIMATION ALGORITHMS 42

SAD(F,G) =W−1

∑x=0

H−1

∑y=0|F(x,y)−G(x,y)|

4.1

SSE(F,G) =W−1

∑x=0

H−1

∑y=0

(F(x,y)−G(x,y))2 4.2

MAD(F,G) =SAD(F,G)

W ×H

4.3

MSE(F,G) =SSE(F,G)

W ×H

4.4

Finding the optimal solution for this problem is usually computationally inviable tobe applied in real time context even at a high level of parallel computing. One of the mainstraightforward block matching algorithms is Full Search, which considers all the candidateblocks in limited rectangular region, computing the dissimilarity function for each of them andreturning the best matching block in the region.

Despite guaranteeing the global minimum in the region, Full Search algorithm is ex-tremely inneficient and motivated many different algorithms to be proposed (PO; MA, 1996;JAMKAR et al., 2002; JAIN; JAIN, 1981; LI; ZENG; LIOU, 1994; ZHU; MA, 2000; KOGAet al., 1981). These algorithms are suboptimal and usually fall into an acceptable local mini-mum, with a fast implementation. Two of them are detailed here: Three Step Search and TwoDimensional Logarithmic Search.

4.4.2 Three Step Search (TSS) algorithm

Proposed in 1981 (KOGA et al., 1981), this algorithm is efficient, quite simple andobtains a near optimal block in most scenarios. The algorithm was intended to be used in videoconference applications. Figure 4.1 shows a visual example of the algorithm’s process, whichcan be summarized in the following steps:

1. Choose an initial step size, which is usually 4 pixels (so that the algorithm endsup performing only three steps) and let the current search point be the same as theblock’s location.

2. Take the eight pixels with horizontal or vertical distance of step size to the center andconsider the block with center on each of these pixels as a candidate block.

3. If the minimum dissimilarity is in one of the blocks of the eight surrounding pixels,move the search point to it. The step size is then halved and step 2 is repeated untilstep size is smaller than 1.

The main problem of this method is usually for small motion parameters, because itonly starts to move in the correct motion direction when the step size is too small and then the

4.4. BLOCK-BASED MOTION ESTIMATION ALGORITHMS 43

Figure 4.1: Example of Three Step Search algorithm’s convergence: the numbersindicate the center of candidate blocks at each execution step.

algorithm probably will most likely stop (step size becoming lower than 1) before it has thechance of locating the precise direction for the optimal block.

4.4.3 Two Dimensional Logarithmic Search (TDL) algorithm

This algorithm (JAIN; JAIN, 1981) is conceptually similar to the TSS algorithm, howeverusually more accurate with an average slightly higher cost. An example of the algorithm’sconvergence is shown in Figure 4.2 and it can be described by the following steps:

1. Choose an initial step size and let the current search point be the original’s blocklocation.

2. Consider the following five pixels: the current search point and the ones with verticalor horizontal distance equal to the step size (diagonals excluded). The five candidateblocks are the ones with center in these five chosen pixels.

3. If the best matching block is the one in the center, the step size is halved. Otherwise,move the search point to the best match position and repeat step 2. When step sizebecomes 1, execute step 2 but considering the diagonals similarly to TSS and returnthe block with best matching among the nine candidate ones.

4.5. MOTION COMPENSATED ‘ALMOST-SORTED’ PERMUTATION 44

Figure 4.2: Example of Two Dimensional Logarithmic Search algorithm’s convergence:the numbers indicate the center of candidate blocks at each execution step.

Notice that it can be proved that this algorithm always stops, since at each point, the stepsize is halved, which possibly converges to 1, or a better matching candidate is found, whichwould converge to the global best matching block.

4.5 Motion compensated ‘almost-sorted’ permutation

The idea for the proposed extension (SABINO et al., 2013) is similar to the motioncompensation principle in video coding and can be summarized as: if the previous frame ismotion compensated before its sorting permutation is computed, it is more likely that suchpermutation results in a better ‘almost-sorting’ one for the next frame.

Although FME motion estimation gives more precise motion estimation, it requiresthat either the bitstream contain motion vectors information or they are transmitted separately.This represents a limitation on the type of codecs used by the scheme and, since this way thescheme would end up losing the codec independence, BME techniques are preferred. A BMEdrawback is that motion estimation algorithms need to be executed on both the encoder anddecoder sides, but some specific algorithms can be applied without compromising the processingtime requirement of the original scheme.

The point of the extenstion is to predict the current frame based on previous frames. Inorder to do that, the motion is assumed to be linear. This assumption can be made, "because of

4.6. EXTENDED CRYPTOGRAPHIC SCHEME 45

Figure 4.3: Linear motion estimation principle for frame prediction

the high frame rates of modern video content", as pointed by (KLOMP; OSTERMANN, 2011).The frame is predicted assuming the preservation of the motion between previous frames. Forthis, let F−1 and F−2 be the two previous frame already encrypted/decrypted by the scheme. Ifthe motion estimation information is calculated from F−2 to F−1 and then is extended linearly toF , a motion compensation can be applied to estimate F , as is shown in Figure 4.3.

A similar principle was applied in (KLOMP; OSTERMANN, 2011), but estimating F

as interpolation from frames F−1 and F+1. However, these techniques are based on the use ofB-frames, since it requires F+1 to be available in the encoding of F and the video encryptionscheme assumes that the encryption of a given frame depends only on previously encodedframes. As suggested by the authors, it can be done without using B-frames if F is estimated asthe extrapolation from frames F−1 and F−2, hence this was the chosen technique for the videoencryption scheme.

Let the motion parameters be the block motion estimation parameters from F−2 to F−1,then they would be the motion vectors that indicate a translation from blocks of frame F−1 toblocks on the previous frame F−2, hence being a matrix of motion vectors as shown in Figure 4.4.Inverting the orientation of every motion vector gives a translation to a block on the predictedframe F . So, that block in F has two reference blocks (one in F−2 and another in F−1) and canbe estimated by averaging the pixel values on the reference blocks.

4.6 Extended cryptographic scheme

The algorithms notation is extended as following:

F predictedi consists in the predicted frame Fi based on its two previous frames.

predict_frame is a subroutine that computes the predicted frame, taking its two

4.6. EXTENDED CRYPTOGRAPHIC SCHEME 46

Figure 4.4: Example of block motion vector parameters in a frame

previous frames as input, using the process described in previous subsection.

The pseudocode for the encryption and decryption algorithms can be seen in Algorithms4.2 and 4.3. For lossless codecs, the frame prediction process can be done directly on the inputframes.

Algorithm 4.2: Motion compensated video encryption algorithmInput :Stream of video frames F1, ...,FN

1 begin2 F transmitted

1 ← E(F1)

3 Fdecrypted1 ← D(F transmitted

1 )

4 P1← unique_sorting_permutation(Fdecrypted1 )

5 Send F transmitted1 through ChS

6 for each Fi, where i = 2, ...,N do7 F transmitted

i ← E(Pi−1(Fi))

8 Fdecryptedi ← P−1

i−1(D(F transmittedi ))

9 F predictedi+1 = predict_frame(Fdecrypted

i ,Fdecryptedi−1 )

10 Send F transmittedi through ChR

11 if i < N then12 Pi← unique_sorting_permutation(F predicted

i+1 )

13 end14 end15 end

4.7. HISTOGRAM HIDING EXTENSION 47

Algorithm 4.3: Motion compensated video decryption algorithmInput :Stream of video frames F transmitted

1 ,F transmitted2 ...,F transmitted

N1 begin2 Receive F transmitted

1 through ChS3 Fdecrypted

1 ← D(F transmitted1 )

4 P1← unique_sorting_permutation(Fdecrypted1 )

5 for each F transmittedi , where i = 2, ...,N do

6 Receive F transmittedi through ChR

7 Fdecryptedi ← P−1

i−1(D(F transmittedi ))

8 if i < N then9 F predicted

i+1 = predict_frame(Fdecryptedi ,Fdecrypted

i−1 )

10 Pi← unique_sorting_permutation(F predictedi+1 )

11 end12 end13 end

The predict_frame function can be implemented using the method described earlier,which is possible to use any block motion estimation algorithm under the following restrictions:

the algorithm must be deterministic : this is necessary condition since the crypto-graphic key is computed based on the unique sorting permutation of the predictedframe, hence the motion vectors obtained for two consecutive frames must be exactlythe same.

the motion vector parameters can be extended linearly : this is an importantcondition for the frame prediction process which comes from the linear extrapolationof the motion vectors towards the predicted frame. This means that any future framecan be linearly predicted with the motion vector information.

4.7 Histogram hiding extension

The histogram hiding extension proposed by Socek et al. can be incorporated into thismotion compensation extension. As explained in Chapter 2, the most conventional approachesfor exploring temporal correlation in digital video coding is to encode the residual differenceamong consecutive frames by a spatial module. Hence, not only this histogram hiding extensionis a potential improvement on security aspect, but it could also improve the compressibility ofthe encrypted frames.

An important concern is if the residual difference alone can be considered an encryptionof the video’s information. As explained in (SOCEK et al., 2007), encoding the residualdifference is usually not enough to hide the video information, since the quality of perception

4.7. HISTOGRAM HIDING EXTENSION 48

can be still high, especially in high motion sequences. Therefore, the residual difference must beencrypted with the sorting permutation.

(a) previous frame (b) current frame

(c) residual difference of (b) and (a) - PNG 148KB (d) compensated residual of (b) and (a) - PNG 113KB

Figure 4.5: Consecutive frames residual difference - Flower sequence

The motion compensated principle used in this extension is applied in the computation ofthe residual difference. It means that, instead of computing the residual difference of the currentframe and its previous frame to hide the histogram, the frame difference is between the currentand the predicted one. This is more likely to output a smaller residual image.

To understand this principle visually, Figure 4.5 shows the consecutive frames residualwith and without motion compensation using the prediction method described earlier withthe TDL block motion estimation algorithm. Notice that the residual image, after motioncompensation, is more strongly spatially correlated. Also in Figure 4.6, it can be seen that theresidual images themselves are temporally correlated, which means that the sorting permutationprocess is possible to encode such correlation and optimize its compressibility.

4.7. HISTOGRAM HIDING EXTENSION 49

Figure 4.6: Consecutive residual frames with motion compensation - Flower sequence

4.7.1 Encryption and decryption extended algorithms

To define the algorithms, the following notation is used:

Fresiduali corresponds to the residual difference frame between frames Fi and F predicted

i .In practice, this is valid for i > 2. For other cases, Fresidual

i = Fi−Fi−1, since theframe prediction needs at least two previous frames to be decrypted. Fdecrypted_residual

i

is the decrypted version of Fresiduali seen at receiver side (in case of lossy codecs, this

may be different from Fresiduali ).

residual is the subroutine that computes the residual difference of the frame parame-ters as described in previous chapter.

restore_frame is the subroutine that is able to restore the current frame given itsprevious frame and the residual difference.

The encryption and decryption methods can be seen in Algorithms 4.4 and 4.5. Noticethat only from the third frame on, the residual is computed using motion compensation. Thesecond frame is encrypted without the histogram hiding extension, since there is no previousresidual sorting permutation.

4.7. HISTOGRAM HIDING EXTENSION 50

Algorithm 4.4: Motion compensated video encryption algorithm on the frame residualsInput :Stream of video frames F1, ...,FN

1 begin2 F transmitted

1 ← E(F1)

3 Fdecrypted1 ← D(F transmitted

1 )

4 P1← unique_sorting_permutation(Fdecrypted1 )

5 Send F transmitted1 through ChS

6 F transmitted2 ← E(P1(F2))

7 Fdecrypted2 ← P−1

1 (D(F transmitted2 ))

8 F predicted3 = predict_frame(Fdecrypted

2 ,Fdecrypted1 )

9 Fdecrypted_residual2 ← residual(Fdecrypted

2 ,Fdecrypted1 )

10 P2← unique_sorting_permutation(Fdecrypted_residual2 )

11 Send F transmitted2 through ChR

12 for each Fi, where i = 3, ...,N do13 Fresidual

i ← residual(Fi,Fpredicted

i )

14 F transmittedi ← E(Pi−1(Fresidual

i ))

15 Fdecrypted_residuali = P−1

i−1(D(F transmittedi ))

16 Fdecryptedi = restore_frame(F predicted

i ,Fdecrypted_residuali )

17 F predictedi+1 = predict_frame(Fdecrypted

i ,Fdecryptedi−1 )

18 Send F transmittedi through ChR

19 if i < N then20 Pi← unique_sorting_permutation(Fdecrypted_residual

i )21 end22 end23 end

4.7. HISTOGRAM HIDING EXTENSION 51

Algorithm 4.5: Motion compensated video decryption algorithm on the frame residualsInput :Stream of video frames F transmitted

1 ,F transmitted2 ...,F transmitted

N1 begin2 Receive F transmitted

1 through ChS3 Fdecrypted

1 ← D(F transmitted1 )

4 P1← unique_sorting_permutation(Fdecrypted1 )

5 Receive F transmitted2 through ChR

6 Fdecrypted2 ← P−1

1 (D(F transmitted2 ))

7 F predicted3 ← predict_frame(Fdecrypted

2 ,Fdecrypted1 )

8 Fdecrypted_residual2 ← residual(Fdecrypted

2 ,Fdecrypted1 )

9 P2← unique_sorting_permutation(Fdecrypted_residual2 )

10 for each F transmittedi , where i = 3, ...,N do

11 Receive F transmittedi through ChR

12 Fdecrypted_residuali ← P−1

i−1(D(F transmittedi ))

13 Fdecryptedi ← restore_frame(F predicted

i ,Fdecrypted_residuali )

14 F predictedi+1 ← predict_frame(Fdecrypted

i ,Fdecryptedi−1 )

15 if i < N then16 Pi← unique_sorting_permutation(Fdecrypted_residual

i )17 end18 end19 end

525252

5Cryptographic scheme analysis

This chapter conducts an analysis in the cryptographic scheme. The main aspectsdescribed here are space, processing time, security and limitations of the scheme.

5.1 Space and processing time analysis

Processing time is one of the most important specific requirements of video encryption.In this section, some of the cryptographic scheme’s implementation details are considered andthe processing time and space used of the scheme is analyzed.

The total processing time consumed by the encryption and decryption process of thebasic algorithms proposed by Socek consists in the following steps:

unique sorting permutation computation

inverse permutation computation

applying permutation on the frame

encoding and decoding of a given frame

The unique sorting permutation’s time complexity depends on the algorithm chosen.Both mentioned approaches are considered in this analysis. The Quicksort variant method has,in worst case, quadratic performance on the frame matrix, which has size W ×H, hence thecomplexity of this operation is O(W 2×H2). In fact, this could be lowered if, as indicated by theauthors, any key comparison sorting algorithm could be used instead of Quicksort in a similarway. So if the sorting algorithm used was a Mergesort based method, the complexity could bereduced to O(WH× log(WH)).

However, if the stable sorting method based on Counting sort is used, this step executionhas time complexity of exactly O(Imax +WH), which is linear on both the frame matrix sizeand the number of intensity values possible for the frame. Imax is usually very small and fixed(in most cases 255, using 8-bits intensity resolution), compared to the frame dimensions, hence

5.1. SPACE AND PROCESSING TIME ANALYSIS 53

this is much more efficient in practice than the Quicksort variant method, especially for higherresolutions.

With respect to the block-based approach proposed Socek, the time complexity of thesorting permutation with respect to the block dimensions Bx and By can be defined as the productof the number of blocks and the sorting cost of each block, as shown in Equation 5.1.

T (W,H,Bx,By) = O((⌈

WBx

⌉×⌈

HBy

⌉)× (BxBy× log(BxBy))

) 5.1

As the block size is lowered until 1×1, this time complexity tends to be linear in W ×H,bringing the quality of perception closer to the original video.

Although not defined explicitly by the scheme’s creators, the inverse permutation compu-tation process can be easily done in linear time in W ×H. The basic idea of the algorithm forsuch performance is to iterate for every position in the matrix and defining P′[P[i]] = i. Applyingthe permutation on a given frame can also be done trivially in linear performance, if it is done ona copy of the original frame.

Hence, the overall time complexity of the original algorithm is determined by the uniquesorting permutation process. Using the proposed method for sorting with Counting sort variantalgorithm, the overall complexity would be O(Imax +W ×H).

The histogram hiding extension does not affect the time complexity in any way, since theframe difference computation is done linearly. However, for the camera translation extensiondefined by the authors, the cost would be degraded only by the cost of the global motionparameters computation, since the permutation adjustment process has also linear performanceon W ×H.

As for the motion compensation extension proposed by this dissertation, the new opera-tions that could affect the time complexity of the scheme are essentially the following:

block motion estimation algorithm

frame prediction based on the motion vectors

The frame prediction process based on the motion vectors can be trivially seen as a linearprocess. On the other hand, the block motion estimation cost is very dependent on the chosenalgorithm. The complexity of the three mentioned methods is analyzed.

The most straightforward time complexity analysis is in the Full Search algorithm.Basically the cost is the product of the number of blocks in the image and the cost of each block’smotion estimation which corresponds to the size of the rectangular area where it performs fullsearch. Notice also that computing the dissimilarity function for each candidate block has cost ofBx×By. Considering the rectangular area denoted by dimensions Rx and Ry, the time complexitycan be defined by Equation 5.2.

5.2. SECURITY ANALYSIS 54

T Full Search (W,H,Bx,By,Rx,Ry) = O((⌈

WBx

⌉×⌈

HBy

⌉)× (Rx×Ry)× (Bx×By)

) 5.2

For the Three Step Search algorithm, the time complexity is dependent on the initial stepsize. By analyzing the algorithm, it is possible to notice that the number of steps executed foreach block is exactly log(step size)+1. Hence, the overall cost of the algorithm is as shown inEquation 5.3.

T TSS (W,H,Bx,By,step) = O((⌈

WBx

⌉×⌈

HBy

⌉)× log(step)× (Bx×By)

) 5.3

The time complexity for the Two Dimensional Logarithmic Search algorithm is morecomplicated to be estimated with a function, since it could potentially be very expensive. Ata worst case scenario, for a given block, it could traverse big part of the image frame findingslightly better candidates slowly. Still, one estimate for the time complexity, based on the stepsize and dissimilarity function maximum value Dmax, the time complexity is given by Equation5.4.

T TDL (W,H,Bx,By,step,Dmax) = O((⌈

WBx

⌉×⌈

HBy

⌉)× log(step)×Bx×By×Dmax

) 5.4

The encoding and decoding process time complexity is considered out of scope in thisanalysis since it is dependent on the chosen codec and this may cover a lot of possible scenarios.

About the space usage, every operation described in this dissertation takes at most only aconstant number of copies of the frame itself, permutation matrix or vectors of Imax size, hencethe overall space complexity is linear on W ×H and Imax.

5.2 Security analysis

In this section, the security aspect of the cryptographic scheme is briefly analyzed andsome types of attacks are studied. A more detailed security analysis can be seen in (SOCEKet al., 2007). Formally, the cryptographic can be defined as the following:

the cryptographic key is the unique sorting permutation of the previous frame

the plaintext is the original video’s content to be encrypted/decrypted by the algo-rithms

the cyphertext is the encrypted video, which is the permuted plaintext by the crypto-graphic key

5.2. SECURITY ANALYSIS 55

5.2.1 Brute force attack

This type of attack, also known as exhaustive key search, consists in testing every possiblekey until the correct one is obtained. It is usually the last resource and the most naive way to beexplored by the attacker, when there is no other security vulnerability in the system.

To show the system is safe against this type of attack, it must be shown that it iscomputationally inviable to iterate on the key search space, since in the worst case, the attackerwould need to traverse through the whole space to find the correct key. Hence, it is necessary tocalculate the key search space size. In the cryptographic system, there are two main target spotsfor the attacker:

the transmission of the first key through ChS

the transmission of the encrypted frame through ChR

For the first key case, assuming ChS represents a conventional cryptographic scheme,such as AES of n-bits keys, the key search space size is 2n.

For the second case, the number of possible keys is the number of possible sortingpermutations for a given frame. To find this number, one should notice that for each permutationof the encrypted frame, there is a single sorting permutation. Hence, the number of distinctframes that can be formed from permutations of the encrypted frame corresponds to the numberof possible keys. Let the encrypted frame F , of dimensions W ×H, with histogram h(x), the keysearch space size is Ω (distinct permutated frames) and can be obtained by Equation 5.5.

Ω =(W ×H)!Imax∏

x=0h(x)!

5.5

Notice that, in practice, using reasonably high video resolutions, the number of possiblekeys is usually a lot higher than the ones from transmitting through ChS. However, for verynarrow histograms, i.e., with only a few intensity values in the image, this number can becomeextremely small. In the worst case scenario, where the frame has only a single intensity, there isonly a single key possible which is the frame’s direct sorting permutation. Hence, in case of verynarrow histogram video sequences, this cryptographic scheme must not be used unless it is setwith the histogram hiding extension.

5.2.2 Known-plaintext and chosen-plaintext attack

The known-plaintext type of attack assumes the attacker has access to a plaintext and itscorresponding cyphertext (KATZ; LINDELL, 2007). The idea is that the attacker should not beable to figure out the mapping function from a pair of plaintext and cyphertext or how to obtainthe key or part of the key for an unknown cyphertext because of that. The unknown cyphertext

5.3. CODEC MODULE LIMITATIONS 56

refers to a encrypted data with no correlation to the example of plaintext and cyphertext theattacker has access to.

In the chosen-plaintext attack type, the attacker has access to the cyphertext of anyarbitrary plaintext. Usually methods of encryption based on permutations generated by a secretkey are considered weak to these types of attack (LI et al., 2004). However, the cryptographicscheme proposed by Socek is safe against both attacks, because the permutation used in thescheme depends only on the plaintext, and not on a secret key. Hence, the key itself is dependentonly on the plaintext, and having access to examples of pair of frames and encrypted frames donot give the attacker any relevant information for an unknown frame.

5.2.3 Chosen-cyphertext attack

In this type of attack, the attacker is able to decrypt a set of cyphertexts of his choice(KATZ; LINDELL, 2007) and this must not give him any information about the decryption ofan unknown cyphertext. Again, in the context of the cryptographic scheme, the attacker wouldonly have access to the inverse permutation of different frames, and since the key is dependentonly on the unknown frame, he would have no access or knowledge about the decryption of suchframe.

On the other hand, although not in the theoretical scope of this type of attack, if the at-tacker has access to a given frame, he is able to decrypt every following frame. This vulnerabilityis partially fixed by the GOP encryption extension to handle frame loss.

5.3 Codec module limitations

The original scheme proposed by Socek et al. is defined for the spatial-only codecs class,allowing it to be lossless or lossy. Although the scheme looks to be generic enough to be able touse any encoder/decoder module, the exact restrictions are studied in this section.

By taking a look at the encryption algorithms, it is possible to notice that at the momentof encoding a given frame, it is strictly necessary to have immediate access to its output to be ableto know how the receiver side will decode that frame. This needs to be done so that both sidesgenerate the same sorting permutation. The impact of this is that no frame can be buffered forlater encoding and hence not allowing any encoding latency related features, such as B-framesbased on future frames. Encoding latency is a technique that allows frame buffering to decide thebest way of encoding a given frame. It is usually possible to set up in the encoder’s configurationhow far in frame numbers the latency can go.

Encoding latency and frame bitstream reordering are not allowed in this cryptographicscheme’s codec module. This can take some of the potential of the most codecs, however mostreal time applications do not allow such features and have this constraint themselves, since theyneed to have immediate access to the frames, making this constraint very tolerable.

575757

6Experiments and results

This chapter shows the experiments executed to evaluate the cryptographic scheme’s per-formance and assess the improvement coming from the extensions proposed in this dissertation.The analysis here is limited to video quality and compressibility.

6.1 Video quality and compressibility metrics

The most common metric to evaluate video coding compression is compression rate. It isdefined as the ratio of the coded video bitstream size to the original video size. Another metric tobe used in the further analysis is bitstream size reduction. Both metrics are defined in Equations6.1 and 6.2.

COMPRESSION_RAT E =coded video bitstream size

original video bitstream size

6.1

BIT ST REAM_SIZE_REDUCT ION = 1−COMPRESSION_RAT E 6.2

For the next experiments, in the lossy codec scenario, the quality degradation in the codedvideo is measured with the average PSNR (peak signal-to-noise ratio) of the video sequenceframes. PSNR is a highly used metric in signal processing. Let Imax be the maximum pixelintensity with respect to the intensity resolution, the PSNR of an distorted image F in relation toG is computed by Equation 6.3.

PSNR(F,G) = 10× log10

(I2max

MSE(F,G)

) 6.3

Note that for identical images, the PSNR value is undefined, since the MSE denominatorwill be equal to zero. The higher the PSNR value is, the best it is as a video quality indicator, i.e.,the smaller the noise is in the coded image.

Despite being a good metric to detect signal distortion in general, several studies show

6.2. VIDEO SEQUENCES DATASET 58

that the PSNR fails to capture the image quality perception properly, i.e., it can generate similarPSNR results for images with much different quality perception by human visual system (GIROD,1993; TEO; HEEGER, 1994; ESKICIOGLU; FISHER, 1995). Hence, another image qualitymetric is used in this study: SSIM - the Structural Similarity Index (WANG et al., 2004).

SSIM is measured on windows of size N×N for two images F and G as in Equation 6.4:

SSIM(F,G) =(2×µF ×µG +C1)× (2×σFG +C2)

(µ2F +µ2

G +C1)× (σ2F +σ2

G +C2)

6.4

where:

µF and µG are the averages of pixel intensities of F and G respectively in the block.

σ2F and σ2

G are the variances of pixel intensities of F and G respectively in the block.

σFG is the covariance of F and G.

the constants C1 and C2 are used to stabilize the division with weak denominator. Itis chosen C1 = (k1× (Imax− Imin +1)) and C2 = (k2× (Imax− Imin +1)) with k1 setto 0.01 and k2 to 0.03 as default.

The overall image quality is evaluated as MSSIM which is the average SSIM of alldisjoint blocks in both images. The video level metric corresponds to the average MSSIM of allframes. The SSIM is computed on 8×8 image windows for the experiments in the next sections.

6.2 Video sequences dataset

The videos sequences selected for the experiments were obtained from benchmarksequences used in digital video processing algorithms. For the experiment, the original sequenceswere losslessly converted to grayscale, where the gray level is given by the average intensity ofthe color components.

Some of the video sequences are in CIF format, while others in QCIF (with quarter frameresolution area). The point is to evaluate the effectiveness of the method in different resolutions.In Table 6.1, the number of frames and resolution can be seen and Figure 6.1 shows a visualexample of a frame of each sequence.

The video sequences in the dataset were classified according to the amount of motion init. This is very important in the analysis of the motion extension proposed in this dissertation.The video sequences Akiyo and Grandma barely have any motion at all and consecutive framesare almost identical in the whole video. Deadline sequence contains a small amount of motionand is a little more dynamic than the previous ones. The remaining sequences Bus, Flower andForeman are in the high motion video sequence group and will be used in the evaluation of themotion compensation extension.

6.2. VIDEO SEQUENCES DATASET 59

(a) Akiyo (b) Bus

(c) Deadline (d) Flower

(e) Foreman (f) Grandma

Figure 6.1: Frame examples of the video sequences dataset

Flower sequence has a lot of motion, however it is quite constant global motion referringto the almost uniform camera translation. On the other hand, Bus motion is quite complex,since there is zoom, camera translation and local motion. Foreman has less motion than the twopreviously mentioned sequences, however the camera is shaking a little and there are a lot ofhard to predict movements, with sudden motion.

6.3. CODEC MODULE CONFIGURATION 60

Table 6.1: Video sequences information

Video Sequence Number of Frames Format ResolutionAkiyo 300 QCIF 176x144Bus 150 CIF 352x288Deadline 1374 CIF 352x288Flower 250 CIF 352x288Foreman 300 CIF 352x288Grandma 870 QCIF 176x144

6.3 Codec module configuration

In the lossless codec tests, the Motion PNG (MPNG) codec is used. This codec essentiallyencodes each frame individually as an image in PNG format into the bitstream (ISO/IEC, 2003).

The codec chosen for the lossy tests is the H.264 (WIEGAND et al., 2003), since it iswidely used in video applications. The encoder and decoder implementations used were thereference software in JM 19.0 version (SULLIVAN; SUHRING, 2009). The codec settingswere fixed for all experiments, using the Baseline profile (WIEGAND et al., 2003), whichwas designed for the zero-latency restrictions of the algorithm (especially videoconferenceapplications), hence B slices are not enabled. For the bitstream size evaluation, every parameterwas set to default value of the Baseline profile, and the quantization parameter (QP) was set to 4.A lower QP impacts in a smaller loss of details in the quantization process, therefore resulting inhigher bitrate and video quality.

6.4 Unique sorting permutation evaluation

In Chapters 3 and 4, the unique sorting permutation methods were discussed. It wasproposed in this dissertation that the Counting sort variant method is more likely to performsignificantly better than the Quicksort method proposed by Socek, because of the stable sortingpermutation principle.

In the experiments, the sequences were encrypted using the original scheme’s methodwith both sorting permutation algorithms for both lossless and lossy scenarios. Although there isno guarantee of better results using Counting Sort, it is highly expected due to the natural spatialcorrelation in the frames of most sequences. The results, for the lossless scenario, confirm theexpectations showing a reduction in bitstream size for every sequence when using the Counting

Sort variant method, as it can be seen in Table 6.2.For the lossy scenario, it is necessary to evaluate both compressibility and video quality.

The results for the lossy case also showed a reduction in bitstream size (Table 6.3) for everysequence along with a slight improvement in both video quality metrics (Tables 6.4 and 6.5)when running the encryption with the H.264 codec with quantization parameter set to 4.

6.5. MOTION COMPENSATED ENCRYPTION EVALUATION 61

Notice that another improvement for this extension is the time processing, since theCounting sort method has linear time complexity, as demonstrated in previous chapter.

Table 6.2: Bitstream size comparison of encrypted video (in KB) for different uniquesorting permutation algorithms for MPNG codec

AlgorithmVideo sequence motion

Low Medium HighAkiyo Grandma Deadline Bus Flower Foreman

Quicksort 3,124 13,578 100,512 18,694 20,887 29,148Counting sort (proposed) 2,912 13,465 98,929 18,540 20,772 28,746

Table 6.3: Bitstream size comparison of encrypted video (in KB) for different uniquesorting permutation algorithms for H.264 codec with QP = 4

AlgorithmVideo sequence motion

Low Medium HighAkiyo Grandma Deadline Bus Flower Foreman

Quicksort 1,017 2,908 31,893 7,842 11,177 10,514Counting sort (proposed) 978 2,879 30,573 7,700 10,415 10,243

Table 6.4: Average PSNR comparison of encrypted video (in dB) for different uniquesorting permutation algorithms for H.264 codec with QP = 4

AlgorithmVideo sequence motion

Low Medium HighAkiyo Grandma Deadline Bus Flower Foreman

Quicksort 38.8331 38.6844 36.8875 35.9372 36.9866 36.2588Counting sort (proposed) 39.2015 38.8709 37.1100 35.9453 37.2788 36.3606

6.5 Motion compensated encryption evaluation

In this section, the performance of the original encryption algorithm by Socek is com-pared to the motion compensation extended method defined in Chapter 4. In order to evaluatethe benefits from using the extension more properly, it is tested using three different motionestimation algorithms mentioned (TSS, TDL and 16×16 FS). It was decided to use as blocksize 8×8 and MAD function was used for the dissimilarity function. The initial step size chosenfor the TSS and TDL algorithms was 4.

When running the algorithms with H.264 codec using QP set to 4, the results forbitstream size (Table 6.6) show a significant reduction of all the extended algorithm versionsfor the high motion sequences, obtaining the greatest reduction when using the Full Search

6.5. MOTION COMPENSATED ENCRYPTION EVALUATION 62

Table 6.5: Average MSSIM comparison of encrypted video for different unique sortingpermutation algorithms for H.264 codec with QP = 4

AlgorithmVideo sequence motion

Low Medium HighAkiyo Grandma Deadline Bus Flower Foreman

Quicksort 0.9733 0.9731 0.9310 0.9101 0.9223 0.9204Counting sort (proposed) 0.9758 0.9765 0.9332 0.9112 0.9289 0.9215

algorithm as motion estimation. In the low motion sequences, the extended methods show barelyany difference in bitstream size reduction, being slightly worse and better in a few cases.

Table 6.6: Bitstream size comparison (in KB) for the motion compensation extensionusing H.264 codec with QP = 4

AlgorithmVideo sequence motion

Low Medium HighAkiyo Grandma Deadline Bus Flower Foreman

Socek 1,017 2,908 31,893 7,842 11,177 10,514TDL (extended) 1,005 2,905 30,758 6,907 8,953 10,015TSS (extended) 1,008 2,906 30,976 6,993 9,253 10,115

FS 16x16 (extended) 1,022 2,982 31,633 6,162 7,729 9,663

Using the same QP, the video quality was evaluated in the same conditions for bothaverage MSSIM and PSNR and the results are shown in Tables 6.7 and 6.8. There was a slightbut consistent improvement in video quality for all the high motion sequences. In the low motionsequences, the extended method again performed slightly better or worse, which means that theimpact of this extension is hardly noticeable in this scenario. Also, the gain in PSNR was veryconsistent with the gain in MSSIM, where every situation of higher PSNR resulted in higherMSSIM as well.

Table 6.7: Average MSSIM comparison for the motion compensation extension usingH.264 codec with QP = 4

AlgorithmVideo sequence motion

Low Medium HighAkiyo Grandma Deadline Bus Flower Foreman

Socek 0.9733 0.9731 0.9310 0.9101 0.9223 0.9204TDL (extended) 0.9729 0.9731 0.9307 0.9118 0.9231 0.9211TSS (extended) 0.9730 0.9730 0.9311 0.9116 0.9230 0.9210

FS 16x16 (extended) 0.9729 0.9733 0.9310 0.9123 0.9239 0.9217

Notice that the bitstream size reduction was almost directly proportional to the motionestimation algorithm’s quality. The balance between processing time and motion estimationprecision must be decided according to the application’s scenario.

6.5. MOTION COMPENSATED ENCRYPTION EVALUATION 63

Table 6.8: Average PSNR comparison (in dB) for the motion compensation extensionusing H.264 codec with QP = 4

AlgorithmVideo sequence motion

Low Medium HighAkiyo Grandma Deadline Bus Flower Foreman

Socek 38.8331 38.6844 36.8875 35.9372 36.9866 36.2588TDL (extended) 38.8408 38.6776 36.9190 35.9624 37.1386 36.2756TSS (extended) 38.8398 38.6754 36.9212 35.9514 37.1211 36.2633

FS 16x16 (extended) 38.7873 38.6001 36.8264 35.9992 37.3005 36.2953

Flower sequence is the one with best performance by the extension, having over 30% ofbitstream size reduction compared to Socek method (as shown in Table 6.9), as expected, sincethe motion in that sequence is very linear. In Figure 6.2, it is possible to notice the significantimprovement in the ‘almost-sorting’ permutation quality because of the motion compensation.On the other hand, Foreman resulted in the smallest improvement, which can be justified by itsunstable motion and also the amount of motion in it.

Table 6.9: Bitstream size reduction relative to Socek method for the extended methods inhigh motion sequences for different motion estimation algorithms using H.264 codec with

QP = 4

Algorithm Flower Bus ForemanProposed (FS 16x16) 30.84% 21.42% 8.09%

Proposed (TDL) 19.89% 11.92% 4.74%Proposed (TSS) 17.21% 10.92% 3.79%

(a) original scheme (b) extended method

Figure 6.2: ‘Almost-sorting’ permutation quality in Flower sequence frame 4

A plot showing the bitstream size per frame can be seen in Figure 6.3. This shows thatthe improvement on compression happens through the whole video and not only on a few frames.

6.6. MOTION COMPENSATED RESIDUAL ENCRYPTION EVALUATION 64

Notice that the extended method only starts to use motion compensation from the third frame on,which explains both methods having same performance on the first two frames.

(a) Flower sequence

(b) Bus sequence

Figure 6.3: Bitstream size by frame plot comparing original method versus motioncompensated extended algorithm (using TDL motion estimation).

The plots in Figure 6.4 show how the variation of the encoder QP value affects the qualityof encryption for each high motion sequence in the PSNR-QP curves (CHEN, 2008). The plotsshow the PSNR values obtained for the encryption when running the algorithm with differentH.264 QP values, varying from 4 to 16 with increment of 2 units. All four algorithms behavedconsistently with the QP variation, having a greater reduction of PSNR in the lower values ofQP, which is expected for the QP parameter definition in the H.264. For values of QP over 10,the video quality starts to be near 30 dB or lower, so QP values below 10 are recommended forreasonable quality performance.

6.6 Motion compensated residual encryption evaluation

The same experiments done in previous sections were executed for the histogram hidingextension encryption with motion compensation. It is expected that the this extension grants the

6.6. MOTION COMPENSATED RESIDUAL ENCRYPTION EVALUATION 65

4 6 8 10 12 14 1626

28

30

32

34

36

QP

PSN

R(d

B)

Socek

TDL (extended)

TSS (extended)

FS 16x16 (extended)

(a) Bus

4 6 8 10 12 14 16

28

30

32

34

36

38

QP

PSN

R(d

B)

Socek

TDL (extended)

TSS (extended)

FS 16x16 (extended)

(b) Flower

4 6 8 10 12 14 16

28

30

32

34

36

QP

PSN

R(d

B)

Socek

TDL (extended)

TSS (extended)

FS 16x16 (extended)

(c) Foreman

Figure 6.4: PSNR-QP plots comparing the original Socek method with the extendedmotion compensation encryption version for each high motion sequence

scheme a better compression rate, as discussed in Chapter 4. Tables 6.10 and 6.11 and 6.12 showthe bitstream size and video quality comparison.

Table 6.10: Bitstream size comparison (in KB) for the motion compensated histogramhiding extension encryption using H.264 codec with QP = 4

AlgorithmVideo sequence motion

Low Medium HighAkiyo Grandma Deadline Bus Flower Foreman

Socek 720 2,486 25,577 7,589 12,108 10,594TDL (extended) 718 2,503 25,525 6,698 9,709 9,997TSS (extended) 718 2,511 25,675 6,730 9,822 10,017

FS 16x16 (extended) 718 2,572 25,925 5,936 8,588 9,603

The results are very similar to the previous experiment, as summarized by the bitstreamsize reduction for the high motion sequences shown in Table 6.13, meaning that the motioncompensation principle also works well for the histogram hiding extension.

6.6. MOTION COMPENSATED RESIDUAL ENCRYPTION EVALUATION 66

Table 6.11: Average MSSIM comparison for the motion compensated histogram hidingextension encryption using H.264 codec with QP = 4

AlgorithmVideo sequence motion

Low Medium HighAkiyo Grandma Deadline Bus Flower Foreman

Socek 0.9752 0.9744 0.9332 0.9113 0.9245 0.9233TDL (extended) 0.9753 0.9742 0.9337 0.9122 0.9258 0.9245TSS (extended) 0.9749 0.9742 0.9333 0.9119 0.9255 0.9242

FS 16x16 0.9749 0.9744 0.9336 0.9130 0.9266 0.9250

Table 6.12: Average PSNR comparison (in dB) for the motion compensated histogramhiding extension encryption using H.264 codec with QP = 4

AlgorithmVideo sequence motion

Low Medium HighAkiyo Grandma Deadline Bus Flower Foreman

Socek 39.6183 38.9070 37.1604 31.4628 33.0015 36.0641TDL (extended) 39.6057 38.8741 37.1120 32.5151 34.8649 36.0146TSS (extended) 39.6032 38.8510 37.1002 32.3230 34.7211 36.0026

FS 16x16 39.5919 38.7997 37.0544 33.8136 35.8655 35.9848

As expected and evidenced by the experiments’s results, the histogram hiding extensioncombined with motion compensation allowed a higher compression rate. The motion compen-sated method also gave the histogram hiding encryption a video quality improvement of up to10% of average PSNR in high motion scenes. Again the PSNR values are consistent with theMSSIM results, being a strong indication of improved video quality.

The PSNR-QP curves in Figure 6.5 also show that the frame quality variation respondsconsistently to the QP parameter in the H.264 codec. For the Bus and Flower sequences, itis quite noticeable the curves of the extended method show an almost constant and noticeableimprovement in average PSNR compared to the original Socek method. In those two sequences,an interesting observation in the graphic is that the quality performance of the FS extended

method is almost the same as the Socek original method when it is encrypted in a 4 units distantQP value, while TDL extended method has same performance in 2 QP units higher than Socekmethod. For the Foreman sequence, due to the little amount of linear motion, the performance in

Table 6.13: Bitstream size reduction relative to Socek method for motion compensatedhistogram hiding encryption methods using H.264 codec with QP = 4

Algorithm Flower Bus ForemanProposed (FS 16x16) 29.97% 21.78% 9.35%

Proposed (TDL) 19.81% 11.74% 5.63%Proposed (TSS) 18.88% 11.31% 5.44%

6.6. MOTION COMPENSATED RESIDUAL ENCRYPTION EVALUATION 67

video quality was practically the same for all the compared methods.

4 6 8 10 12 14 16

24

26

28

30

32

34

QP

PSN

R(d

B)

Socek

TDL (extended)

TSS (extended)

FS 16x16 (extended)

(a) Bus

4 6 8 10 12 14 16

26

28

30

32

34

36

QP

PSN

R(d

B)

Socek

TDL (extended)

TSS (extended)

FS 16x16 (extended)

(b) Flower

4 6 8 10 12 14 1626

28

30

32

34

36

QP

PSN

R(d

B)

Socek

TDL (extended)

TSS (extended)

FS 16x16 (extended)

(c) Foreman

Figure 6.5: PSNR-QP plots comparing the original Socek method with the extendedmotion compensation residual encryption version in each high motion sequence

686868

7Conclusion

The video encryption technique studied and extended in this work is adequate to thevideo encryption application requirements. Its known security vulnerabilities are tolerable bymost of the existing applications and its efficiency, codec independence and simplicity maycontribute to the choice of using such algorithm in a video application.

The strongest point of the algorithm is that it is concerned about both compressionand encryption at a single step. The way in which this was originally meant to be done bySocek et al. was by increasing the frames’s spatial correlation, so that they become morecompressible. This work can be seen as a continuation for exploring more properly non-trivialtemporal correlation, extending the concepts of the algorithm to the temporal correlation level aswell, without compromising any of the requirements of the applications the scheme was designedfor.

This dissertation also contributed to reducing the time complexity of the original schemeby using a faster implementation for the unique sorting permutation used in the scheme. Thenew method for permutation computation is not only faster, but it also allows the scheme to havea better compressibility and video quality performance as indicated by the experiments results.

The biggest contribution of this work is, however, in making the cryptographic schemeperformance more resilient to high motion video sequences, which was not the focus of theoriginal method. The proposed motion compensation techniques, which resulted in a conferencepublication (SABINO et al., 2013) introduced how the sorting permutation can be adjusted toconsider the motion occurring in the consecutive frames. The developed techniques rely onwidely used block motion estimation algorithms that are used in most modern video codecs.Experimentally, it is shown in this work that the performance of the scheme was significantlyimproved with such motion compensation extension, making it possible to obtain a bitstreamsize reduction of 30 percent in one of the high motion video sequences while also improvingslightly its video quality.

The developed extensions do not compromise the codec independence of the crypto-graphic scheme. In fact, even a spatial-only codec can be used with the motion compensationextension, which means that the temporal correlation techniques affect only the unique sorting

69

permutation and is independent on the codec itself. The motion compensation was also shown towork well with the histogram hiding extension proposed by the original authors, which improvesa security aspect of the scheme and its compressibility performance.

For the motion compensation algorithm, only block-based motion estimation techniqueswere considered in the scope of this work. However, there are possibly many other approachesthat could be adapted to fit into the cryptographic scheme and there is certainly a big opportunityfor future research in this aspect.

Also, in the context of motion itself, only translation parameters were computed withmotion vectors. A possible target of future work could be to include other motion aspects, suchas zoom and rotation.

For last, the motion estimation techniques studied in this dissertation are based on BMEmethods not to violate the codec independence of the cryptographic scheme. However, it isknown FME methods will generate a much more precise motion estimation and will allow thedecoder side not to need to compute motion estimation at all. It is definitely possible to createa codec specific version of the cryptographic scheme, optimized for a given codec that couldexplore FME methods and obtain a much better performance, in compression, video quality anddecoding processing time. This is also another significant and interesting idea of future work.

707070

References

ARMITANO, R. M.; FLORENCIO, D. A. F.; SCHAFER, R. W. The motion transform: a newmotion compensation technique. In: ACOUSTICS, SPEECH, AND SIGNAL PROCESSING.1996 IEEE INTERNATIONAL CONFERENCE - VOLUME 04, Washington, DC, USA.Proceedings. . . IEEE Computer Society, 1996. p.2295–2298. (ICASSP 96).

BHARGAVA, B.; SHI, C.; WANG, S. MPEG Video Encryption Algorithms. Multimedia Toolsand Applications Journal, Hingham, MA, USA, v.24, n.1, p.57–79, Sept. 2004.

BURROWS, M. et al. A block-sorting lossless data compression algorithm. [S.l.]: DigitalSystems Research Center, Palo Alto, California, USA, 1994.

CHEN, Z. A rate and distortion analysis for H.264/AVC video coding. In: CIRCUITS ANDSYSTEMS, 2008. ISCAS 2008. IEEE INTERNATIONAL SYMPOSIUM ON. Anais. . .[S.l.: s.n.], 2008. p.1612–1615.

CORMEN, T. H. et al. Introduction to Algorithms. 2nd.ed. [S.l.]: McGraw-Hill HigherEducation, 2001.

ESAKKIRAJAN, S.; VEERAKUMAR, T.; NAVANEETHAN, P. Adaptive vector quantizationbased video compression scheme. In: MULTIMEDIA, SIGNAL PROCESSING ANDCOMMUNICATION TECHNOLOGIES, 2009. IMPACT ’09. INTERNATIONAL. Anais. . .[S.l.: s.n.], 2009. p.40–43.

ESKICIOGLU, A.; FISHER, P. Image quality measures and their performance.Communications, IEEE Transactions on, [S.l.], v.43, n.12, p.2959–2965, Dec 1995.

GIROD, B. What’s Wrong with Mean-squared Error? In: WATSON, A. B. (Ed.). DigitalImages and Human Vision. Cambridge, MA, USA: MIT Press, 1993. p.207–220.

GONZALEZ, R. C.; WOODS, R. E. Digital Image Processing. 3rd.ed. Upper Saddle River,NJ, USA: Prentice-Hall, Inc., 2006.

GRANGETTO, M.; MAGLI, E.; OLMO, G. Multimedia Selective Encryption by Means ofRandomized Arithmetic Coding. Multimedia, IEEE Transactions on, [S.l.], v.8, n.5,p.905–917, Oct 2006.

HORVATH, E. C. Stable Sorting in Asymptotically Optimal Time and Extra Space. J. ACM,New York, NY, USA, v.25, n.2, p.177–199, Apr. 1978.

HUFFMAN, D. A Method for the Construction of Minimum-Redundancy Codes. Proceedingsof the IRE, [S.l.], v.40, n.9, p.1098–1101, Sept. 1952.

IHARA, S. Information Theory for Continuous Systems. [S.l.]: World Scientific, 1993.(Series on Probability and Statistics).

IRANI, M.; ANANDAN, P. About Direct Methods. In: INTERNATIONAL WORKSHOP ONVISION ALGORITHMS: THEORY AND PRACTICE, London, UK, UK. Proceedings. . .Springer-Verlag, 2000. p.267–277. (ICCV ’99).

REFERENCES 71

ISO/IEC. Information Technology-Computer Graphics and Image Processing-PortableNetwork Graphics (PNG): functional specification - ISO/IEC 15948:2004. [S.l.: s.n.], 2003.Standard.

JAIN, J.; JAIN, A. Displacement Measurement and Its Application in Interframe Image Coding.Communications, IEEE Transactions on, [S.l.], v.29, n.12, p.1799–1808, Dec 1981.

JAKIMOSKI, G.; SUBBALAKSHMI, K. Cryptanalysis of Some Multimedia EncryptionSchemes. Multimedia, IEEE Transactions on, [S.l.], v.10, n.3, p.330–338, April 2008.

JAMKAR, S. et al. A comparison of block-matching search algorithms in motion estimation. In:INTERNATIONAL CONFERENCE ON COMPUTER COMMUNICATION, 15.Proceedings. . . International Council for Computer Communication, 2002. p.730–739. (ICCC’02).

KATZ, J.; LINDELL, Y. Introduction to Modern Cryptography (Chapman & Hall/CrcCryptography and Network Security Series). [S.l.]: Chapman & Hall/CRC, 2007.

KLOMP, S.; OSTERMANN, J. Motion Estimation at the Decoder. [S.l.]: InTech, 2011.77-92p.

KOGA, T. et al. Motion compensated interframe coding for video conferencing. ProceedingsNTC’81 (IEEE), [S.l.], 1981.

LI, R.; ZENG, B.; LIOU, M. L. A new three-step search algorithm for block motion estimation.IEEE Trans. Circuits Syst. Video Technol, [S.l.], v.4, p.438–442, Aug 1994.

LI, S.; CHEN, G.; ZHENG, X. Chaos-Based Encryption for Digital Images and Videos. In:FURHT, B.; KIROVSKI, D. (Ed.). Multimedia Security Handbook. [S.l.]: CRC Press, LLC,2004. p.133–167.

LI, S. et al. A General Cryptanalysis of Permutation-Only Multimedia EncryptionAlgorithms. IACRs Cryptology ePrint Archive: report 2004/374. 2004.

LIU, F.; KOENIG, H. A novel encryption algorithm for high resolution video. In:INTERNATIONAL WORKSHOP ON NETWORK AND OPERATING SYSTEMS SUPPORTFOR DIGITAL AUDIO AND VIDEO, New York, NY, USA. Proceedings. . . [S.l.: s.n.], 2005.p.69–74. (NOSSDAV ’05).

MAPLES, T.; SPANOS, G. Performance Study of a Selective Encryption Scheme for theSecurity of Networked, Real-Time Video. In: INTERNATIONAL CONFERENCE ONCOMPUTER COMMUNICATIONS AND NETWORKS, 4., Washington, DC, USA.Proceedings. . . IEEE Computer Society, 1995. p.2. (ICCCN ’95).

MAYER-PATEL, K.; LE, L.; CARLE, G. An MPEG performance model and its application toadaptive forward error correction. In: ACM INTERNATIONAL CONFERENCE ONMULTIMEDIA, 10., New York, NY, USA. Proceedings. . . [S.l.: s.n.], 2002. p.1–10.(MULTIMEDIA ’02).

MENEZES, A. J.; VANSTONE, S. A.; OORSCHOT, P. C. V. Handbook of AppliedCryptography. 1st.ed. Boca Raton, FL, USA: CRC Press, Inc., 1996.

REFERENCES 72

PO, L.; MA, W.-C. A Novel Four-Step Search Algorithm for Fast Block Motion Estimation.IEEE Trans. Circuits Syst. Video Technol, [S.l.], v.6, p.313–317, 1996.

SABINO, C. C. et al. Motion Compensation Techniques in Permutation-Based Video Encryption.In: IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS,2013., Washington, DC, USA. Proceedings. . . [S.l.: s.n.], 2013. p.1578–1581.

SHANNON, C. A mathematical theory of communication. Bell System Technical Journal,The, [S.l.], v.27, n.3, p.379–423, July 1948.

SHI, C.; BHARGAVA, B. A fast MPEG video encryption algorithm. In: ACMINTERNATIONAL CONFERENCE ON MULTIMEDIA, 6., New York, NY, USA.Proceedings. . . [S.l.: s.n.], 1998. p.81–88. (MULTIMEDIA ’98).

SHI, C.; BHARGAVA, B. An Efficient MPEG Video Encryption Algorithm. In: THE 17THIEEE SYMPOSIUM ON RELIABLE DISTRIBUTED SYSTEMS, Washington, DC, USA.Proceedings. . . IEEE Computer Society, 1998. p.381. (SRDS ’98).

SINGH, G.; SUPRIYA. A Study of Encryption Algorithms (RSA, DES, 3DES and AES) forInformation Security. International Journal of Computer Applications, [S.l.], v.67, n.19,p.33–38, April 2013.

SOCEK, D. et al. Digital video encryption algorithms based on correlation-preservingpermutations. EURASIP J. Inf. Secur., New York, NY, United States, v.2007, p.10:1–10:8,Jan. 2007.

SULLIVAN, G.; SUHRING, K. H.264/14496-10 AVC Reference Software Manual. [S.l.]:Joint Video Team (JVT), 2009.

TANG, L. Methods for encrypting and decrypting MPEG video data efficiently. In: ACMINTERNATIONAL CONFERENCE ON MULTIMEDIA, 4., New York, NY, USA.Proceedings. . . [S.l.: s.n.], 1996. p.219–229. (MULTIMEDIA ’96).

TEO, P.; HEEGER, D. Perceptual image distortion. In: IMAGE PROCESSING, 1994.PROCEEDINGS. ICIP-94., IEEE INTERNATIONAL CONFERENCE. Anais. . . [S.l.: s.n.],1994. v.2, p.982–986 vol.2.

TORR, P. H. S.; ZISSERMAN, A. Feature Based Methods for Structure and Motion Estimation.In: INTERNATIONAL WORKSHOP ON VISION ALGORITHMS: THEORY ANDPRACTICE, London, UK, UK. Proceedings. . . [S.l.: s.n.], 2000. p.278–294. (ICCV ’99).

WANG, Z. et al. Image quality assessment: from error visibility to structural similarity. ImageProcessing, IEEE Transactions on, [S.l.], v.13, n.4, p.600–612, April 2004.

WIEGAND, T. et al. Overview of the H.264/AVC video coding standard. Circuits and Systemsfor Video Technology, IEEE Transactions on, [S.l.], v.13, n.7, p.560–576, July 2003.

ZHU, S.; MA, K. A New Diamond Search Algorithm for Fast Block-matching MotionEstimation. IEEE Trans. Image Processing, [S.l.], v.9, n.2, p.287–290, 2000.