93
DIGITAL WATERMARKING BASED ON HUMAN VISUAL SYSTEM A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES OF THE MIDDLE EAST TECHNICAL UNIVERSITY BY ALPER KOZ IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE IN THE DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING SEPTEMBER 2002

DIGITAL WATERMARKING A THESIS SUBMITTED TO …eee.metu.edu.tr/~alatan/PAPER/MSalper.pdf · digital watermarking based on human visual system a thesis submitted to the graduate school

  • Upload
    lekhanh

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

DIGITAL WATERMARKING

BASED ON HUMAN VISUAL SYSTEM

A THESIS SUBMITTED TO

THE GRADUATE SCHOOL OF NATURAL AND APPLIED SCIENCES

OF

THE MIDDLE EAST TECHNICAL UNIVERSITY

BY

ALPER KOZ

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF SCIENCE

IN

THE DEPARTMENT OF ELECTRICAL AND ELECTRONICS ENGINEERING

SEPTEMBER 2002

Approval of the Graduate School of Natural and Applied Sciences

Prof. Dr. Tayfur Öztürk Director

I certify that this thesis satisfies all the requirements as a thesis for the degree

of Master of Science.

Prof. Dr. Mübeccel Demirekler Head of Department

This is to certify that we have read this thesis and that in our opinion it is fully

adequate, in scope and quality, as a thesis for the degree of Master of Science.

Assoc. Prof. Dr. A. Aydın Alatan Supervisor

Examining Committee Members

Prof. Dr. Levent Onural

Assoc. Prof. Dr. A. Aydın Alatan

Assoc. Prof. Dr. Gözde Bozdağı Akar

Assoc. Prof. Dr. Tolga Çiloğlu

Assoc. Prof. Dr. Engin Tuncer

� iii�

ABSTRACT�

��

DIGITAL�WATERMARKING�BASED�ON�HUMAN�VISUAL�SYSTEM�

Koz,�Alper�

M.Sc.,�Department�of�Electrical�and�Electronics�Engineering�

Supervisor:�Assoc.�Prof.�Dr.�A.�Aydın�Alatan�

��

September�2002,�80�pages�

��

The�recent�progress�in�the�digital�multimedia�technologies�has�offered�many�facilities�

in� the�transmission,� reproduction�and�manipulation�of�data.� �However,� this�advance�

has� also� brought� the� problem� such� as� copyright� protection� for� content� providers.�

Digital� watermarking� is� one� of� the� proposed� solutions� for� copyright� protection� of�

multimedia.� A� watermark� embeds� an� imperceptible� signal� into� data� such� as� audio,�

image�and�video,�which�indicates�whether�or�not�the�content� is�copyrighted.�Within�

this� scope,� digital� watermarking� methods,� which� are� designed� to� exploit� many�

aspects� of� HVS� in� order� to� provide� an� imperceptible� and� robust� watermark,� are�

reviewed.� Then,� two� watermarking� methods,� which� are� based� on� foveation� and�

� iv�

temporal sensitivity� phenomena� of� HVS,� respectively,� are� proposed.� These�

approaches�have�not�been�exploited�for�the�purpose�of�digital�watermarking.�The�first�

proposed� method� embeds� watermark� into� the� image� periphery� according� to�

foveation-based� HVS� contrast� thresholds.� Compared� to� the� other� HVS-based�

watermarking� methods,� the� simulation� results� demonstrate� an� improvement� in� the�

robustness� of� the� proposed� approach� against� image� degradations,� such� as� JPEG�

compression,� cropping� and� additive� Gaussian� noise.� In� addition,� the� proposed�

method�for�the�images�is�adapted�for�video�and�the�robustness�of�the�adapted�method�

against�ITU�H263+�coding�is�tested.�The�second�method,�which�is�proposed�for�only�

video�watermarking,�exploits� the� temporal�contrast� thresholds�of�HVS�to�determine�

the�location,�where�the�watermark�should�be�embedded�and�the�maximum�strength�of�

the� watermark.� The� results� demonstrate� that� the� proposed� scheme� survives� video�

distortions,�such�as�additive�Gaussian�noise,�ITU�H263+�coding�at�bit�rates�not�lower�

than�230-240�kbps,�frame�dropping�and�frame�averaging.����

Keywords:� Digital� Watermarking,� Human� Visual� System,� Contrast� thresholds,�

Contrast�Masking,�Foveation,�Temporal�Sensitivity,�H.263.���

���

����������

� v

ÖZ�

İNSAN�GÖRME�SİSTEMİNE�DAYALI �GÖRÜNMEZ�

DAMGALAMA�

Koz,�Alper�

Yüksek�Lisans,�Elektrik�ve�Elektronik�Mühendisliği�Bölümü�

Tez�yöneticisi:�Doç.�Dr.�A.�Aydın�Alatan�

Eylül�2002,�80�sayfa�

Son�yıllarda�sayısal�teknolojinin�gelişimi�sayısal�bilginin�üretilmesinde,�iletilmesinde�

ve�kullanımında�büyük�kolaylıklar�sağlamıştır.��Fakat,�bu�gelişme�aynı�zamanda�telif�

hakkının� korunması� gibi� bir� problemi� daha� da� belirgin� hale� getirmiştir.� Görünmez�

sayısal�damgalama�bu�soruna�önerilen�çözümlerden�birisidir.�Görünmez�damga�ses,�

imge�ve�video�gibi�bilgilerin� içine�saklanır�ve�bilginin� izinsiz�kullanımı�durumunda�

telif� hakkı� sahibinin� kendi� sahipliğini� ispatlamasını� sağlar.� Bu� çerçevede,� insan�

görme� sisteminin� (İGS)� özelliklerini� kullanan� görünmez� damgalama� yöntemleri�

incelenmiş� ve� İGS’nin� odaklanma� özelliğine� ve� zamansal� değişimlere�duyarlılığına�

dayalı� iki� farklı� yöntem� önerilmiştir.� Birinci� yöntem,� odaklanmaya� dayalı� kontrast�

eşik�değerlerini�kullanarak,��odaklanılan��noktadan�uzaklaştıkça�damganın�büyüklüğü��

� vi�

artacak� şekilde�damgayı� imgeye�koyar.�Yöntemin� toplamsal�Gauss�gürültüsü,� imge�

kırpma�ve�imge�sıkıştırma�(JPEG)�gibi�ataklara�dayanıklılığı�ve�daha�önceki�İGS’ye�

dayalı� yöntemlerden� daha� iyi� sonuçlar� verdiği� gösterilmiştir.� Ayrıca� yöntem� video�

için� uyarlanmış� ve� yöntemin� ITU� H263+� kodlamasına� karşı� gürbüzlüğü�

gösterilmiştir.� Sadece� video� damgalama� için� önerilen� ikinci� yöntem� ise,� İGS’nin�

zamansal� kontrast� eşik� düzeylerinden� faydalanarak� damganın� videonun� hangi�

kısımlarına� konacağını� ve� damganın� büyüklüğünü� belirler.� Yöntemin,� ITU� H263+�

kodlama� (230-240� kbps’den� daha� büyük� bit� hızları� için),� Gauss� gürültüsü� ekleme,�

çerçevelerin� ortalamasını� alma� ve� çerçeve�düşürme� gibi� ataklara� karşı� dayanıklılığı�

gösterilmiştir.���

Anahtar� Kelimeler:� � İnsan� Görme� Sistemi,� Görünmez� Damgalama,� Odaklanma,�

H263+,�Zamansal�Kontrast�Eşik�Düzeyi.��

�vii�

ACKNOWLEDGMENTS��

I� would� like� to� thank� my� supervisor,� Assoc.� Prof.� A� Aydın� Alatan� for� his� valuable�

supervision�and�support�during�the�preparation�of�this�thesis.��

�viii�

TABLE�OF�CONTENTS�

ABSTRACT�� iii��

ÖZ�� v�

ACKNOWLEDGEMENTS�� vii�

TABLE�OF�CONTENTS�������������������������������������������������������������������������������������������������������viii��

LIST�OF�TABLES�� x�

LIST�OF�FIGURES� xi�

LIST�OF�ABBREVIATIONS������������������������������������������������������������������������������������������������xiii�

CHAPTER�

1� INTRODUCTION�� 1�

1.1�Watermarking�Applications� 2��

1.2�Watermarking�Requirements��� 3��

1.3�Trade�off�between�requirements� 5��

1.4�The�importance�of�vision�models�� 7�

1.5� Problem�Statement�� 9�

1.6� Outline�of�Dissertation�� 9�

2� BASICS�OF�HUMAN�VISUAL�SYSTEM� 11��

� 2.1�Contrast�and�Contrast�thresholds� 11�

2.1.1�Light�Adaptation� 14��

2.1.2�Contrast�Masking� 17�

� 2.2�Spatial�and�Temporal�Masking�� 23��

2.3� Foveation�� 26�

2.4� Temporal�Sensitivity� 33�

� � 2.4.1�Fundamental�Definitions� 34��

2.4.2�Temporal�Contrast�Sensitivity�Function�� 35�

� ix�

2.4.3�Temporal�Contrast�Thresholds�For�spatial�DCT�frequencies� 39�

� 3� WATERMARKING�BASED�ON�VISUAL�MODELS� 41��

3.1�Image�Watermarking�Methods�based�on�Visual�Models��� 41�

3.2�Video�Watermarking�Methods�based�on�Visual�models�� 45�

� 4� FOVEATED�IMAGE�WATERMARKING� 47�

4.1�Introduction� 47�

4.2�Foveation� 48��

4.3�Proposed�Watermarking�Method�� 49�

4.4�Adaptation�of�the�Method�to�Videos� 53�

4.5�Experimental�Results�� 53�

� 5�� TEMPORAL�WATERMARKING�OF�DIGITAL�VIDEOS� 59�

5.1�Introduction�� 59�

5.2�Watermarking�Procedure� 61�

5.3�Watermark�Detection�� 63��

5.4�Simulation�Results�� 65��

� 5.4.1�Robustness�to�Additive�Gaussian�noise.�� 69�

� 5.4.2�Robustness�to�ITU�H263�+�Coding�� 70�

� 5.4.3�Robustness�to�Frame�Averaging�and�Dropping� 71�

� 6� SUMMARY�AND�DISCUSSIONS� 75�

� �

REFERENCES� 77�

� x

LIST�OF�TABLES���TABLE�

2.1�Quantization�levels�for�four�levels�DWT�transform.� 22�

� 4.1�Correlation�Results�against�Cropping�� 57�

� 4.2�Correlation�Results�against�Additive�Gaussian�Noise� 57�

� 4.3�Correlation�Results�against�JPEG�Compression�� 57�

�� 4.4�Correlation�Results�against�ITU�H263+�Coding� 58�

� 5.1�Correlation�Results�for�Coast�and�Carphone�Sequences�after��

�� ��Additive�Gaussian�Noise��� 69�

� 5.2�Correlation�Results�for�Coast�and�Carphone�Sequences�after��

� ���ITU�H263+�Coding��� 71�

� 5.3�Correlation�Results�for�Coast�and�Carphone�Sequences�after��

� ��after�frame�dropping���� 72�

� 5.4�Correlation�Results�for�Coast�and�Carphone�Sequences�after��

� ��frame�averaging�� 72�

� xi�

LIST�OF�FIGURES�

FIGURE�

� 1.1�� General�Scheme�for�Watermarking�� � � � � � 2�

1.2�����The�illustration�of�the�trade�off�between�the�imperceptibility�and�robustness� 6�

��1.3������An�example�illustrating�perceptual�brightness�is�not�a�monotonic�function�of�

intensity� � � � � � � � � 8�

2.1� Demonstration�of�apparent�brightness�is�not�only�dependent�to��

����������absolute�luminance� � � � � � � � 12�

2.2� Examples�for�the�spatial�patterns�where�the�Weber�contrast�is�used�� � 12�

2.3� The�demonstration�of�Michelson�contrast�for�a�sinusoidal�grating�of�a�spatial�

frequency.��� � � � � � � � � 13�

2.4� The��configuration�for�the�experiments�to�measure�contrast�threshold� � 15�

2.5� Contrast�sensitivity�as�a�function�of�spatial�frequency.� � � � 16��

2.6� The�change�in�the�detection�threshold�as�a�function�of�a�mean�luminance.� � 16�

2.7� The��configuration�for�the�experiments�conducted�to�study�contrast�masking.� 20�

2.8� The�amplitude�of�the�signal�in�...�������������� � � � � 21�

2.9� The�demonstration�of�the�change�in�contrast�threshold�as�a�function�of�masker�

contrast. � ������������� � � � � � � 22�

2.10� Visibility�Thresholds�for�a�narrow�bar�of�a�white�noise�in�the�… � ������������� 24�

2.11� Visibility�Thresholds�for�a��40�ms.�flash�of�dynamic�white�…� � � 25��

2.12� �Anatomy�of��human�eye�� � � � � � � 26�

2.13���Rods,�cones�and�ganglion�cells�density�as�a�function�of�eccentricity.��� � 27�

2.14����Original�Lena�Image�and�its�foveated�version�� � � � � 27�

� 2.15����The�configuration�for�the�experiments�to�determine…� � � � 29�

� 2.16����Contrast�Sensitivity�for�patches�of�sinusoidal�grating�as�a�function…� � 29�

2.17����The�configuration�for�the�experiments�to�determine�the�critical�…�� � 31�

2.18����Discrete�wavelet�transform�structure��� � � � � � 32�

2.19����Some�terms�to�describe�visual�stimuli.��� � � � � � 34�

�xii�

2.20��The�target�in�(a)�is�modulated�with�respect�to�the�…� � � � 36�

2.21��Temporal�Contrast�Sensitivity�Function�of�HVS.�� � � � � 37�

2.22��The�spatial�configurations�of�the�two�different�targets.�� � � � 38�

2.23��The�effect�of�spatial�frequency�upon�temporal�contrast…� � � � 38�

2.24��Temporal�contrast�thresholds�for�spatial�DCT�frequencies…�� � � 40�

2.25��Temporal�(a),�spatial�(b)�and�orientation�(c)�components�of�…�� � � 40�

4.1����Typical�Geometry.��� � � � � � � � 48�

4.2����Contrast�Threshold�Weight�Function�� � � � � � 50�

4.3����Illustration�of�the�difference�between�the�previous�…� � � � 52�

4.4����Original�image�and�watermarked�image�according�to�proposed�method�� � 55�

4.5����(a)�original�image,�(b)�watermarked�image�according�to�proposed…�� � 56�

5.1����Overall�structure�of�the�watermarking�process.� � � � � 61�

5.2����Overall�structure�of�the�watermark�detection�process.� � � � 64�

5.3����Frame�from�Coast�video.�(a)�original�frame,�(b)�watermarked�frame.�� � 66�

5.4����Frame�from�Carphone �video.�(a)�original�frame,�(b)�watermarked�frame.�� � 66�

5.5����The�number�of�watermarked�coefficients�vs.�discrete�temporal…�� � � 67�

5.6����Illustration�of�where�the�watermark�are�embedded…� � � � 68�

5.7����Mean�of�the�inner�product�results�(see�Eqn.�(5.5))�as�a�function�of�the��

���������discrete�temporal�frequency�after�additive�Gaussian�noise.�� � � � 70�

5.8� ��Mean�of�the�inner�product�results�(see�Eqn.�(5.5))�as�a�function�of�the�discrete�temporal�����

���������frequency�after�ITU�H263+�coding�at�a�bit�rate�of�230�kbps.�� � � 71�

5.9� ��Mean�of�the�inner�product�results�(see�Eqn.�(5.5))�as�a�function�of�the�discrete�temporal��

���������frequency�after�frame�dropping.�� � � � � � � 73�

5.10��Mean�of�the�inner�product�results�(see�Eqn.�(5.5))�as�a�function�of�the�discrete�temporal�

���������frequency�after�frame�averaging.�� � � � � � � 74�

�������

�xiii�

LIST�OF�ABBREVIATIONS��

CIF� � Common�Interface�Format�

CT�� � Contrast�Threshold��

CS� � Contrast�Sensitivity�

DCT� � Discrete�Cosine�Transform�

DWT�� � Discrete�Wavelet�Transform��

HVS� � Human�Visual�System��

IDCT� � Inverse�Discrete�Cosine�Transform�

ITU� � International�Telecommunication�Union�

PSNR� � Peak�Signal�to�Noise�Ratio�

TCSF� � Temporal�Contrast�Sensitivity�Function��

QCIF� � Quarter�Common�Interface�Format�

� �

� 1

CHAPTER�1�

���

INTRODUCTION��

In� recent� years,� digital� multimedia� technology� has� shown� a� significant� progress.� This�

technology� offers� so� many� new� advantages� compared� to� the� old� analog� counterpart.� The�

advantages� during� the� transmission� of� data,� easy� editing� any� part� of� the� digital� content,�

capability� to�copy�a�digital�content�without�any�loss�in�the�quality�of�the�content�and�many�

other� advantages� in� DSP,� VLSI� and� communication� applications� have� made� the� digital�

technology� superior� to� the� analog� systems.� Particularly,� the� growth� of� digital� multimedia�

technology�has�shown� itself�on� Internet�and�wireless�applications.�Yet,� the�distribution�and�

use�of�multimedia�data�is�much�easier�and�faster�with�the�great�success�of�Internet.��

� The� great� explosion� in� this� technology� has� also� brought� some� problems� beside� its�

advantages.� The� great� facility� in� copying� a� digital� content� rapidly,� perfectly� and� without�

limitations�on�the�number�of�copies�has�resulted�the�problem�of�copyright�protection.�Digital�

watermarking�is�proposed�as�a�solution�to�prove�the�ownership�of�digital�data.�A�watermark,�

a�secret�imperceptible�signal,�is�embedded�into�the�original�data�in�such�a�way�that�it�remains�

present�as�long�as�the�perceptible�quality�of�the�content�is�at�an�acceptable�level.�The�owner�

of� the� original� data� proves� his/her� ownership� by� extracting� the� watermark� from� the�

watermarked�content�in�case�of�multiple�ownership�claims.�

� A� general� scheme� for� digital� watermarking� is� given� in� Figure� 1.1.� The� secret�

signature�(watermark)�is�embedded�to�the�cover�image�by�using�a�secret�key�at�the�coder�(C).�

Only�the�owner�of�the�data�knows�the�key�and�it�is�not�possible�to�remove�the�message�from�

the�data�without�the�knowledge�of�the�key.�Then,�the�watermarked�image�passes�through�the�

transmission�channel.�The�transmission�channel� includes�the�possible�attacks,�such�as� lossy�

compression,� geometric� distortions,� any� signal�processing�operation�and�digital-analog�and�

analog�to�digital�conversion,�etc.��After�the�watermarked�image�passes�through�these�possible�

operations,�the�message�is�tried�to�be�extracted�at�the�decoder�(D).��

� 2�

Figure�1.1�General�Scheme�for�Watermarking��

1.1�Watermarking�Applications��

Although�the�main�motivation�behind�the�digital�watermarking�is�the�copyright�protection,�its�

applications�are�not�that�restricted.�There�is�a�wide�application�area�of�digital�watermarking,�

including�broadcast�monitoring,� fingerprinting,�authentication�and�covet�communication� [1,�

2,3,4].��

By�embedding�watermarks� into�commercial�advertisements,� the�advertisements�can�

be�monitored�whether�the�advertisements�are�broadcasted�at�the�correct�instants�by�means�of�

an�automated�system�[1,2].�The�system�receives�the�broadcast�and�searches�these�watermarks�

identifying�where�and�when�the�advertisement�is�broadcasted.��The�same�process�can�also�be�

used�for�video�and�sound�clips.�Musicians�and�actors�may�request�to�ensure�that�they�receive�

accurate�royalties�for�broadcasts�of�their�performances.��

Fingerprinting� is� a� novel� approach� to� trace� the� source� of� illegal� copies� [1,2].� The�

owner� of� the�digital� data�may�embed�different�watermarks� in� the� copies�of�digital� content�

customized� for� each� recipient.� � In� this� manner,� the� owner� can� identify� the� customer� by�

extracting�the�watermark�in�the�case�the�data�is�supplied�to�third�parties.�

� 3�

The� digital� watermarking� can� also� be� used� for� authentication� [1,2].� The�

authentication�is�the�detection�of�whether�the�content�of�the�digital�content�has�changed.�As�a�

solution,�a�fragile�watermark�embedded�to�the�digital�content�indicates�whether�the�data�has�

been�altered.�If�any�tampering�has�occurred�in�the�content,� the�same�change�will�also�occur�

on�the�watermark.�It�can�also�provide�information�about�the�part�of�the�content�that�has�been�

altered.��

Covert�communication�is�another�possible�application�of�digital�watermarking�[1,2].�

The�watermark,�secret�message,�can�be�embedded�imperceptibly�to�the�digital�image�or�video�

to�communicate�information�from�the�sender�to�the�intended�receiver�while�maintaining�low�

probability�of�intercept�by�other�unintended�receivers.�

� There� are� also� non-secure� applications� of� digital� watermarking.� It� can� be� used� for�

indexing�of�videos,�movies�and�news�items�where�markers�and�comments�can�be�inserted�by�

search� engines� [2].� Another� non-secure� application� of� watermarking� is� detection� and�

concealment� of� image/video� transmission� errors� [5].� For� block� based� coded� images,� a�

summarizing�data�of�every�block�is�extracted�and�hidden�to�another�block�by�data�hiding.�At�

the�decoder�side,�this�data�is�used�to�detect�and�conceal�the�block�errors.�

�1.2�Watermarking�Requirements���

The�efficiency�of�a�digital�watermarking�process�is�evaluated�according�to�the�properties�of�

perceptual�transparency,�robustness,�computational�cost,�bit�rate�of�data�embedding�process,�

false�positive�rate,�recovery�of�data�with�or�without�access�to�the�original�signal,�the�speed�of�

embedding� and� retrieval� process,� the� ability� of� the� embedding� and� retrieval� module� to�

integrate�into�standard�encoding�and�decoding�process�etc.�[1,�2,�6,�7].��

Depending� on� the� application,� the� properties,� which� are� used� mainly� in� the�

evaluation� process,� varies.� For� example,� in� the� video� indexing� application,� evaluating� the�

robustness�of�a�watermarking�scheme�to�any�signal�processing�is�meaningless,�since�there�is�

no� case� that� the� video� passes� through� some� signal� processing� operation.� In� the� covert�

communication�application,�it�is�better�to�use�a�watermarking�scheme�that�does�not�need�the�

original�data�during�the�watermark�detection�process,�if�real�TV�broadcasting�is�used�as�the�

communication�channel,�while�most�of�the�watermarking�schemes�in�other�applications�need�

the�original�data�during�the�detection�process.� If� the�application�is�the�copyright�protection,�

the�owner�of� the�original�data�may�wait� for� several�days� to� insert/detect�watermark,� if� the�

data�is�valuable�for�the�owner.��On�the�other�hand,�in�a�broadcast�monitoring�application,�the�

� 4�

speed� of� the� watermark� detection� algorithm� should� be� as� fast� as� the� speed� of� real� time�

broadcasting.�As�a� result,�each�watermarking�application�has� its�own� requirements�and� the�

efficiency�of�the�watermarking�scheme�should�be�evaluated�according�to�these�requirements.��

� As�noted,� the�main�motivation�behind�digital�watermarking� is�copyright�protection.�

The�owner�of�the�original�data�wants�to�prove�his/her�ownership�in�case�the�original�data�is�

copied,� edited� and� used� without� permission� of� the� owner.� In� the� watermarking� research�

world,� this� problem� has� been� analyzed� in� a� more� detailed� manner� [7,� 8,� 9,� 10,� 11,� 12].�

Researchers� on� this� area� focused� on� the� requirements� to� provide� useful� and� effective�

watermarks� for� copyright� protection.� The� requirements� for� an� effective� watermark� are�

imperceptibility,�robustness�to�intended�or�non-intended�any�signal�operations�and�capacity.��

The� imperceptibility� refers� to� the� perceptual� similarity� between� the� original� and�

watermarked� data.� The� owner� of� the� original� data� mostly� does� not� tolerate� any� kind� of�

degradations�in�his/her�original�data.�Therefore,�the�original�and�watermarked�data�should�be�

perceptually� the� same.� The� imperceptibility� of� the� watermark� is� tested� by� means� of� some�

subjective� experiments� [8].� The� original� data� and� watermarked� data� are� presented� to� a�

number�of�subjects,�randomly.�The�subjects�are�asked�the�quality�of�which�work,�original�or�

watermarked�data�is�more�pleasant.�If�the�percentage�of�the�answers�for�each�of�the�two�data�

is� approximately� equal� to� %50,� then� the� watermarked� data� is� perceptually� equal� to� the�

original�data.��

Robustness� to� a� signal� processing� operation� refers� to� the� ability� to� detect� the�

watermark,�after� the�watermarked�data�has�passed�through�that�signal�processing�operation.�

The�robustness�of�a�watermarking�scheme�can�vary�from�one�operation�to�another.�Although�

it�is�possible�for�a�watermarking�scheme�to�be�robust�to�any�signal�compression�operations,�it�

may�not�be�robust�to�geometric�distortions�such�as�cropping,�rotation,�translation�etc.�(for�the�

case,� the�data� is� an� image).� The� signal� processing�operations,� for�which� the�watermarking�

scheme� should� be� robust,� changes� from� application� to� application� as� well.� While,� for� the�

broadcast� monitoring� application,� only� the� robustness� to� the� transmission� of� the� data� in� a�

channel� is� sufficient,� this� is� not� the� case� for� copyright� protection� application� of� digital�

watermarking.� For� such� a� case,� it� is� totally� unknown� through� which� signal� processing�

operations� the� watermarked� data� will� pass.� Hence,� the� watermarking� scheme� should� be�

robust�to�any�possible�signal�processing�operations,�as�long�as�the�quality�of�the�watermarked�

data�preserved.�

The�capacity�requirement�of�the�watermarking�scheme�refers�to�be�able�to�verify�and�

distinguish�between�different�watermarks�with�a� low�probability�of�error�as� the�number�of�

� 5�

differently� watermarked� versions� of� an� image� increases� [11].� While� the� robustness� of� the�

watermarking� method� increases,� the� capacity� also� increases� where� the� imperceptibility�

decreases.�There�is�a�trade�off�between�these�requirements�and�this�trade�off�should�be�taken�

into�account�while�the�watermarking�method�is�being�proposed.��

1.3�Trade�off�Between�Requirements�

In�order� to�show� the� trade�off�between� the� robustness�and� imperceptibility� requirements,�a�

popular�spread�spectrum�image�watermarking�method� is�examined�[7].� In� this�method,� first�

two-dimensional� Discrete� Cosine� Transform� (DCT)� of� the� image� is� computed.� Then,� the�

maximum� 1000� largest� coefficients� are� determined� and� the� watermark� sequence� of� length�

1000,�which�is�generated�from�a�zero�mean�unit�variance�Gaussian�distribution,� is�added�to�

those�coefficients�by�using�the�following�relation:��

)),(.1)(,(),( * vuWvuIvuI α+= ���������������������������������������������(1.1)�

where� *),( vuI �is�the�watermarked�coefficients,� ),( vuI is�the�DCT�coefficients�of�the�original�

image,� ),( vuW is� the� watermark� component� added� to� the� thvu ),( � DCT� coefficient� of� the�

image�and� α is� the�scale� factor� that�determines� the� trade�off�between� � imperceptibility�and�

robustness.�If� α increases,�obviously�the�added�energy�to�the�image�will�increase�and�it�will�

be�easier�to�detect�the�watermark.�In�other�words,�the�robustness�of�the�watermarking�scheme�

improves�with�a�greater�α .�On�the�other�hand,�an�increase�in�α �produces�more�distortion�in�

the�image.�The�different�watermarked�images�for�different� α values�are�illustrated�in�Figure�

1.2.� As� α increases,� the� distortion� in� the� image� becomes� more� severe.� � Therefore,� the�

maximum�value�of� α ,�which�still�does�not�result�perceptible�distortion�in�the�image,�should�

be�determined� to�achieve�maximum� robustness.� In� [7],� α is� taken�as�0.1.�Such�a� trade�off�

will�always�exist�between�different�requirements.�Hence�the�“best” �method�is�determined�by�

the�application.���

� 6�

�� �

(a)���������������������������������������������������������������������(b)�

(c)���������������������������������������������������������������������(d)�

Figure�1.2�The�illustration�of�the�trade�off�between�the�imperceptibility�and�robustness.�The�

original�image�is�watermarked�by�using�the�Spread�Spectrum�Watermarking�Method.�(a)�

original�Lena�image;�(b)� 1.0=α ;�(c)� 4.0=α ;�(d)� 7.0=α .�

� 7�

1.4��The�Impor tance�of�Visual�Models��

In�the�digital�watermarking�literature,�more�sophisticated�approaches�are�used�to�arrange�the�

trade� off� between� imperceptibility� and� robustness.� In� principle,� most� of� these� approaches�

exploit� some� deficiencies� of� Human� Visual� System� (HVS).� For� instance,� perceptual�

brightness�of�HVS�is�not�a�simple� function�of� intensity.�Figure�1.3�illustrates�the�case.�The�

actual�intensity�distribution�of�the�image�in�Figure�1.3�(a)�is�plotted�in�Figure�1.3�(b).�While�

each�strip�in�the�pattern�is�uniform�in�physical�intensity,�the�perceived�brightness�distribution�

in�each�strip�is�not�uniform.�The�right�side�of�each�strip�seems�brighter�than�the�left�side.�As�a�

conclusion,� HVS� is� not� a� perfect� detector� and� this� fact� gives� the� opportunity� for� digital�

watermarking.�In�other�words,�it�is�possible�to�make�some�modifications�in�visual�data�while�

these�modifications�are�imperceptible�for�HVS.� �

The� watermarking� schemes,� which� use� visual� models,� can� be� modeled� as� follows�

[13]:��

),(* wIfII += ���������������������������������������������������������(1.2)�

where� I is�the�original� image,� *I � is�the�watermarked�image�and�the�added�signal�to� I � is�a�

function�of�watermark�signal,� wand� I .�For�example,�one�of�the�simple�case�of�(1.2)�is�the�

spread�spectrum�watermarking�method�[7]�where� f �is�equal�to ),(.).,( vuWvuI α �(see�(1.1)).�

In�such�a�watermarking�scheme,�when� the� image�energy� in�a�particular� frequency� ),( vu � is�

small,� then� the� inserted�watermark�energy� into� that� frequency� is�also� reduced.�This�avoids�

the�visible�artifacts�in�the�image.�On�the�other�hand,�when�the�watermark�energy�is�large�at�

that� frequency,� the� watermark� energy� is� increased.� Hence,� the� robustness� of� the� system�

improves.��

� If�an�image�independent�scheme�is�used,�(1.2)�reduces�to�the�following�form:��

�� wII +=* ������������������������������������������������������������(1.3)�

The�disadvantage�of�such�a�scheme�is�to�shape�the�watermark�spectrum�independently�from�

the�image.�The�power�present�in�the�frequency�bands�varies�greatly�from�image�to�image.�If�

the�image�energy�in�a�particular�band�is�very�low�and�the�watermark�energy�in�that�band�is�

� 8�

high,�then�some�artifacts�are�created�in�the�image,�since�the�watermark�energy�is�too�strong�

relative� to� the� image.� In� addition,� with� such� a� scheme,� it� is� not� possible� to� add� more�

watermark�energy�to�a�particular� frequency,� in�which�the� image�energy� is�high,� in�order�to�

improve�robustness.������

� The�critical�point�in�digital�watermarking�schemes�is�to�determine�the�function� f in�

(1.2).� The� use� of� perceptual� model� shows� its� importance� at� this� point.� By� the� use� of�

perceptual� models,� it� is� possible� to� determine� which� parts� of� the� image� are� significant� to�

HVS�and� to�determine� the�strength�of� the�watermark�sequence,�which�yields� imperceptible�

distortions�in�the�image�while�achieving�maximum�robustness.�

���������������������� �

�����(a)�

����(b)�

Figure�1.3� � � �An�example� illustrating�perceptual�brightness� is�not�a�monotonic� function�of�

intensity.� Although� each� strip� in� the� image� (a)� has� uniform� intensity,� the� perceptual�

brightness�of�each�strip�is�not�uniform.��The�actual�intensity�distribution�is�shown�in�(b).�

� 9�

1.5�Problem�Statement��

In� this� thesis,� we� first� review� the� basics� of� HVS.� In� order� to� understand� the� digital�

watermarking�methods�based�on�HVS,�such�a�review�is�required.�In�the�review�part,�we�also�

examine� the� foveation� and� temporal sensitivity� phenomena� of� HVS,� which� have� not� been�

analyzed� for� the� purpose� of� digital� watermarking.� Then,� we� propose� two� watermarking�

scheme�that�exploits�these�phenomena,�respectively.������

� Briefly,�the�foveation�phenomenon�of�HVS�corresponds�to�the�fact�that�the�sampling�

density� of� HVS� decreases� rapidly� away� from� the� point� of� gaze.� This� fact� is� characterized�

with�contrast�thresholds�in�vision�research.�While�the�contrast�threshold�of�HVS�is�minimum�

at� the� gazing� point,� it� decreases� rapidly� while� the� distance� to� the� gazing� point� is� getting�

larger.�By�using� these�contrast� thresholds,� it� is�possible� to�propose�a�watermarking�method�

that� embeds� the� watermark� energy� mostly� into� the� periphery� of� image.� The� details� of� the�

proposed�method�are�given�in�Chapter�4.��

� In�the�second�method,�we�exploit�temporal�sensitivity,�which�refers�to�the�sensitivity�

of�HVS�to� the� temporal� fluctuations� in� the�visual� target.�This�phenomenon� is�characterized�

with� temporal contrast thresholds.� By� using� these� thresholds,� we� propose� a� video�

watermarking�method� that�embeds� the�watermark� into�video� in� the� temporal�direction.�The�

thresholds�determine� the� location�where� the�watermark�should�be�embedded�and�maximum�

strength�of�the�watermark,�which�yields�imperceptible�distortion�in�the�video.���

1.6�Outline�of�Disser tation�

Chapter � 2:� The� basics� of� HVS� such� as� contrast� concept,� spatial� and� temporal� masking,�

foveation�phenomenon�and�temporal�sensitivity�are�defined.���

Chapter �3:�A�literature�review�on�digital�image�and�video�watermarking,�which�is�based�on�

HVS,�is�given.�����

Chapter �4:�A�digital�image�watermarking�method�which�exploits�the�foveation�phenomenon�

of�HVS�is�proposed.�The�method�is�also�extended�for�video.�The�robustness�of�the�methods�

to�possible�image�and�video�processing�applications�is�also�tested.��

� 10�

Chapter �5:�A�digital�video�watermarking�method�which�is�based�on�temporal�sensitivity�of�

HVS� is� proposed.� The� robustness� of� the� method� to� typical� video� attacks� such� as� additive�

Gaussian�noise,�ITU�H263�+�coding,�frame�dropping�and�frame�averaging�is�tested.��

�Chapter � 6:� Concluding� remarks� are� specified.� Possible� extensions� and� improvements� are�

discussed.�

�����

� 11�

CHAPTER�2��

��

BASICS�OF�HUMAN�VISUAL�SYSTEM�����

This� chapter� presents� an� overview� of� Human� Visual� System� (HVS)� basics� that� are� used�

within�the�scope�of�image�and�video�watermarking.�The�first�section�gives�the�definitions�of�

contrast� for�simple�gratings�and�explains� the�concept�of�contrast� thresholds.�Since�contrast�

thresholds� are� of� great� importance� while� determining� the� maximum� strength� of� the�

watermark� that�will�be�embedded�to� image/video,� the� factors� that�affect�contrast� thresholds�

are�also�examined.� In� the�second�section�of� this�chapter,� the�spatial�and� temporal�masking�

phenomena�of�HVS�are�explained.�In�the�third�section,�the�foveation�characteristic�of�HVS�is�

presented.� This� part� forms� a� background� for� our� proposed� image� and� video� watermarking�

method�in�Chapter�4.�Therefore,�the�basics�about�the�foveation�are�given�in�a�detailed�form.�

The� fourth� section� explains� the� temporal� sensitivity� of� HVS.� The� visual� experiments�

conducted� to� measure� temporal� contrast� thresholds� are� given� and� how� temporal� contrast�

thresholds� change� with� the� spatial� configuration� of� the� visual� target� is� analyzed.� The�

proposed� method� in� Chapter� 5� for� video� watermarking� is� mostly� based� on� this� section.�

Therefore,�the�basics�given�in�this�section�should�be�understood�well.��

2.1�Contrast�and�Contrast�Thresholds�

The�apparent�brightness�of�any�point� in� the�visual� target� is�not�only�dependent�on�absolute�

luminance�of� that�point�but�also�dependent� to� its� local�variations� in�surrounding� luminance�

(Figure�2.1).�Contrast�is�the�measure�of�this�relative�variation�of�luminance�[14].��

� Two�definitions�of�contrast�have�been�commonly�used�for�measuring�the�contrast�of�

simple�patterns.�The�Weber�contrast�is�used�to�measure�the�local�contrast�of�a�single�target�of�

� 12�

uniform� luminance� observed� against� a� uniform� background.� An� example� is� illustrated� in�

Figure�2.2.�Weber�contrast�is�defined�as:��

�L

LC

∆= ������������������������������������������������������������(2.1)�

where� L∆ � is� the� difference� between� the� target� luminance� and� uniform� background�

luminance,� L .��

(a)����������������������������������(b)�

Figure� 2.1.� � Demonstration� of� apparent� brightness� is� not� only� dependent� to� absolute�

luminance.�Although�the�intensity�of�the�inner�squares�is�same,�the�inner�square�in�the�target�

(b)�seems�darker�than�the�one�in�(a).�This�shows�that�the�apparent�intensity�is�also�dependent�

on�the�luminance�of�the�neighborhood�regions.�

� �

��������������������(a)�������������������������������������������������������������������(b)�

Figure�2.2�Examples�for�the�spatial�pattern�where�the�Weber�contrast�is�used.�Weber�contrast�

of�these�simple�spatial�patterns�is�L

LC

∆= .�

� 13�

� The� second� contrast� definition� is� the� Michelson� contrast� that� is� used� to� measure�

contrast�of�a�periodic�pattern�such�as�a�sinusoidal�grating.�It�is�defined�as:��

minmax

minmax

LL

LLC

+

−= ������������������������������������������������������������(2.2)�

where� �max

L and�min

L � are� the� maximum� and� minimum� luminance� values,� respectively.�

Figure�2.3�illustrates�the�discussion.��

Figure�2.3.� �The�demonstration�of�Michelson�contrast� for�a� sinusoidal�grating�of�a� spatial�

frequency.��max

L and�min

L �are�the�maximum�and�minimum�luminance�values,�respectively.�

� Contrast threshold� is� defined�as� the� minimum� level� that� the� contrast� of� the� visual�

target� becomes� visible.� It� is� determined�by�means�of� visual� experiments.�For� instance,� the�

visual�experiments� for� the�case�of�Weber�contrast� is�conducted�as� follows�[16].�Firstly,� the�

luminance�of� the�target� image�set�equal� to� the�background�luminance� in�Figure�2.2�and�the�

targets� in�Figure�2.2� (a)�and� (b)�are�presented� randomly� to� the�subjects.�Then,� the�subjects�

standing�at�a�specific�distance�away�from�the�visual�target�are�asked�which�of�the�two�regions�

(inside�the�circle�and�outside�the�circle)�in�the�visual�target�is�brighter.�When�the�luminance�

� 14�

of�the�two�regions�are�equal,�the�subject�will�give�a�correct�answer�50%�of�the�time.�Then�the�

luminance�of� the� target� is� increased�until� the�subjects�give� the�correct�answer�75�%�of� the�

time.�This�level�of� L∆ �is�defined�as�the�just noticeable difference�(JND)�at�that�background�

luminance.�The�ratio�of�the�JND�to�the�background�luminance�is�the�contrast�threshold.�

� In� the� Michelson� contrast� case,� the� contrast� thresholds� are� determined� as� follows:�

The�subjects�standing�at�a�specific�distance�away�from�the�visual�targets�are�asked�whether�

they�differentiate�the�grating�in�Figure�2.4�(b)�from�the�target�with�zero�contrast�in�Figure�2.4�

(a).� If� not,� the� amplitude� of� the� grating� is� increased� (figure� 4(c))� until� the� subjects� say� it�

visible�50�%�of�the�time.�The�ratio�of�this�amplitude�of�the�grating�to�the�sum�of�maximum�

and�minimum�luminance�values�(Eqn.2.2)�is�the�contrast threshold�for�that�spatial�frequency.�

The� contrast� threshold� is� measured� for� each� spatial� frequency.� Contrast sensitivity� for� a�

spatial�frequency�is�the�inverse�of�the�contrast�threshold�of�that�frequency.�In�Figure�2.5,�the�

plot�of�contrast�sensitivity�as�a�function�of�spatial�frequency,�i.e.�contrast�sensitivity�function,�

is�illustrated.�It�shows�a�band�pass�characteristic.�HVS�is�more�sensitive�to�the�middle�spatial�

frequencies.� The� sensitivity� in� low� and� high� frequencies� sharply� decreases� after� a� cutoff�

frequency.�

2.1.1�L ight�Adaptation��

The� contrast� threshold� for� a� spatial� frequency� is� dependent� on� the�mean� luminance�of� the�

sinusoidal�grating.�For�example,�in�Figure�2.4,�the�threshold�is�measured�for�a�mean�intensity�

of�128.��If�it�were�different,�the�measured�contrast�threshold�would�be�different.�The�contrast�

threshold� increases�with� the�mean� luminance.�This�phenomenon�of�HVS� is�known�as� light�

adaptation� [17,� 18].� In� Figure� 2.6,� the� change� in� the� thresholds� as� a� function� of� mean�

luminance,� L,� is� illustrated� [18].� As� the� mean� luminance� is� decreasing,� the� thresholds� are�

decreasing.� The� luminance� is� given� in� cd.m-2.�Note� that� the� thresholds� illustrated� here� are�

measured�to�determine�the�maximum�quantization�level�for�a�spatial�DCT�frequency�that�will�

yield� imperceptible� distortion� in� the� resulted� image� in� the� case� of� 8x8� block� based� DCT�

coding.�In�other�words,�these�thresholds�correspond�to�the�amplitude�of�the�sinusoidal�grating�

illustrated� in�Figure�2.4.� It�does�not�correspond�to�the�contrast� threshold�that� is� the�ratio�of�

the�amplitude�of�the�sinusoidal�grating�to�the�mean�of�the�grating.�����

� 15�

Fig

ure�

2.4.

� The

��con

figu

ratio

n�fo

r�th

e�ex

peri

men

ts�to

�mea

sure

�con

tras

t�thr

esho

ld�f

or�e

ach�

spat

ial�f

requ

ency

.�The

�upp

er�p

lots

�sho

w�

inte

nsity

�leve

l�of�

the�

hori

zont

al�c

ross

sect

ion�

of�th

e�lo

wer

�spa

tial�g

ratin

gs.�

(a)�

(b)�

(c)�

� 16�

Figure�2.5.��Contrast�sensitivity�as�a�function�of�spatial�frequency.�

Figure�2.6.�The�change�in�the�detection�threshold�as�a�function�of�a�mean�luminance.�From�

the� top,� the�curves�are� for�spatial�DCT� frequencies�of� { 7,7} ,� { 0,7} ,{ 0,0} ,� { 0,3} �and� { 0,1} �

[18].�

� 17�

� The� thresholds� illustrated� in� Figure� 2.6� are� formulated� in� [18]� with� the� following�

equation:��

Taooookijijk cctt )/(.= �����������������������������������������������(2.3)�

where� ijt is�the�threshold�for�the thji ),( coefficient�of�the�8x8�DCT�transform�that�is�measured�

when� the�mean� luminance�corresponds�to�gray� level�of�128,� ookc � is� the�DC�coefficient� for�

the� thk �8x8�block�of�the�image,� ta �is�the�parameter�that�controls�the�strength�of�the�masking�

where� its� suggested� value� is� 0.649� and� ooc � is� the� DC� coefficient� corresponding� to� mean�

luminance� which� is� equal� to� 1024� for� an� 8� bit�mage.� Hence,� for� an� 8x8� block�with�mean�

value�of�128,� ijijk tt = .�

2.1.2�Contrast�Masking�

Masking� refers� to� the� effect� of� one� stimulus� on� the� detectability� of� another� stimulus.� For�

instance,� in� the� audio� case,� a� strong� noise� can� hide� a� weaker� signal� such� as� the� talking�

between� two�people.� In� the� image�case,�masking� refers� to�a�decrease� in� the�visibility�of�an�

image�component�because�of� the�presence�of�another.� �The�experiments�about� the�contrast�

masking�are�conducted� in� [19].�The�subjects�are�asked� to�discriminate� the�superposition�of�

a+b� of� two�sinusoidal� grating� from�b�presented�alone.�Grating�b� is� called� the� masker� and�

grating� a� is� called� the� signal.� Its� contrast� is� varied� to� find� the� threshold� of� visibility.� The�

configuration� for� the�experiments� is� illustrated� in�Figure�2.7�and�Figure�2.8.�While� there� is�

no� accurate� visible� difference� between� the� Figure� 2.7� (b)� and� (c),� the� difference� become�

visible�in�Figure�2.8�after�increasing�the�amplitude�of�the�signal.��

In�the�case�of�8x�8�block�based�DCT�coding,�there�are�64�DCT�frequencies�and�each�

DCT� frequency� is� masked� by� itself� and� other� 63� frequencies� (There� can� be� also� some�

masking�affects�across�8x8�blocks).�Watson�[18]�neglect�the�masking�effects�for�other�DCT�

frequency�components�and�consider� the�case�where� the�each� frequency� is�masked�by�only�

itself.�The�formulation�is�as�follows:���

).,max(1 ijw

jkj

w

jkjkijk iti

icitm−= ������������������������������������������(2.4)�

� 18�

where� ijkm is� the�masked� threshold�of� the�signal,� ijkc is� the� thji ),( �DCT�coefficient�of� the�

thk �block�of�the�image,� ijkt is�the�threshold�after�light�adaptation�and ijw is�an�exponent�that�

lies�between�0�and�1.��The�function�is�plotted�in�Figure�2.9�for�a�typical�empirical�value�of�

ijw =0.7�and� ijt =2.�The�increase�in�the�masker�contrast�results�with�an�exponential�increase�

in� the�detection� contrast� of� the�signal.�Actually,� the� function� showing� the� changing�of� the�

contrast� threshold�of� the�signal�should�be�of� four�dimensions.�The�contrast� threshold� is� the�

dependent�variable�and�it�depends�on�the�value�of�masker�contrast,�spatial� frequency�of�the�

masker�and�spatial�frequency�of�the�signal.�An�illustration�for�the�case�is�given�in�[4].��

The�data� in�Figure�2.9�can�be� interpreted�as� follows.�Assume� that� the�value�of� the�

thji ),( � coefficient� of� thk block� of� the� image� is� 10000,� 10000=ijkc (The� graph� is�

logarithmic).� �The�contrast� threshold�corresponds� to� ijkc =10000,� is�approximately�equal� to�

1000,�i.e.� 1000=ijkm .�Then,�the�HVS�cannot�sense�the�difference�perceptually�between�the�

two�images�where�one�is�the�original�image�with� 10000=ijkc ,�and�the�other�is�the�modified�

image�with� 100010000±=ijkc .�

In�summary,�light�adaptation�and�contrast�masking�phenomenon�of�HVS�are�studied�

for� the� purpose� of� determining� image� dependent� maximum� quantization� levels� that� yields�

imperceptible�distortion� [18].� In� the�process,� the� image� is� first�divided� into�blocks�of�8x8.�

The� DCT� transform� of� each� block� is� computed.� The� DCT� coefficients� are� illustrated� as�

ijkc where�(i,j)�are�DCT�frequencies�and�k�is�the�number�of�the�block.�The�visible�threshold�

for� each� spatial� DCT� frequency� (i,� j),� ijt ,� is� determined� by� means� of� visual� experiment�

(Figure�2.4).�Then,�the�effects�of�the�mean�luminance�of�the�sinusoidal�grating�on�the�visible�

threshold� are� taken� into� account� (2.3).� At� the� next� step,� the� effect� of� contrast� masking�

phenomenon� is� inserted� to� process� (2.4).� The� resulted� thresholds� give� the� maximum�

quantization� levels� that� will� yield� imperceptible� distortions� in� the� image.� In� [18],� these�

threshold� formulations� are� used� for� image� coding� purposes.� Specifically,� they� are� used� to�

determine�the�optimum�quantization�levels�that�will�yield�minimum�perceptible�distortion�for�

a�given�bit�rate.�The�same�quantization�levels�can�also�be�used�to�embed�maximum�strength�

watermark� that�will�be� invisible� to�HVS.�A�method�based�on�this�approach�[10,11]�will�be�

explained�in�Chapter�3.��

� 19�

As�noted,�the�visual�thresholds�for�the�spatial�DCT�frequencies�are�measured,�since�

DCT� based� image� compression� methods� are� widely� used.� One� other� compression� method�

used� extensively� in� the� image� coding� is� wavelet-based� compression� [38].� The� image� is�

decomposed� into� subbands� that� vary� in� spatial� frequency� and� orientation.� Uniform�

quantization� of� coefficients� in� each� subband� usually� yields� visible� artifacts.� In� order� to�

eliminate� the� visible� distortions� in� the� compressed� image,� the� visual� threshold� that� will�

determine� the� maximum� quantization� level� for� each� subband� are� determined� by� means� of�

visual� experiments� [20].� These� thresholds� are� given� in� Table� 2.1� for� each� subband.� The�

reader�may�refer�to�[20]�for�the�details�of�these�visual�experiments.��

� The�visual� thresholds�measured�for� this�wavelet�approach�for� the�purpose�of� image�

compression�are�also�used�for�the�purpose�of�image�watermarking�[10,11].�The�watermark�is�

inserted�into�coefficients�of�the�subband�that�are�greater�than�these�thresholds.�The�strength�

of� the� watermark� obviously� should� not� exceed� the� visual� threshold� of� the� subband.� The�

method�is�explained�in�detail�in�Chapter�3.�����

� 20�

(a)�

Fig

ure�

2.7.

�The

��con

figu

ratio

n�fo

r�th

e�ex

peri

men

ts�c

ondu

cted

�to�s

tudy

�con

tras

t�mas

king

.�(a)

�is�th

e�si

gnal

.���T

he�a

im�is

�to�m

easu

re�

the�

cont

rast

�thre

shol

d�of

�sig

nal�i

n�th

e�pr

esen

ce�o

f�th

e�m

aske

r (b

).��T

he�s

ubje

cts�

are�

forc

ed�w

heth

er�th

ey�d

iscr

imin

ate�

the�

mas

ker�

from

�the�

mas

ker+

sign

al�(

c).�F

or�th

is�c

ase,

�the�

visu

al�d

iffe

renc

e�be

twee

n�(b

)�an

d�(c

)�is

�not

�sig

nifi

cant

.��

(b)�

(c)�

� 21�

Fig

ure�

2.8.

�The

�am

plitu

de�o

f�th

e�si

gnal

�in�F

ig.�7

(a)�

is�in

crea

sed�

and�

the�

diff

eren

ce�b

etw

een�

the�

mas

ker�

and�

mas

ker�

+�

sign

al�b

ecom

e�vi

sibl

e�.��

(a)�

(b)�

(c)�

� 22�

�Figure�2.9��The�demonstration�of�the�change�in�contrast�threshold�as�a�function�of�masker�

contrast.� ijkc � is� the� contrast� of� the� masker.� ijkm � is� the� contrast� of� the� signal.� The� plot� is�

given�in�logarithmic�scale�[18].��

Table��2.1.��Quantization�levels�for�four�level�DWT�transform.�9-7�biorthogonal�filters�[38]�

are� used� as� decomposition� filter� in� the� DWT� process.� The� visual� angle� during� the� visual�

experiments�is�32�pixels/degree.���

Level�Orientation�

1� 2� 3� 4�LL� 14.05� 11.11� 11.36� 14.5�HL� 23.03� 14.68� 12.71� 14.16�HH� 58.76� 28.41� 19.54� 17.86�LH� 23.03� 14.69� 12.71� 14.16�

� 23�

2.2�Spatial�and�Temporal�Masking��

The� spatial� and� temporal� masking� phenomenon� of� HVS� is� studied� in� [21].� A� nonlinear�

spatiotemporal� model� of� human� threshold� vision� is� proposed.� The� model� prediction� is�

compared�with�the�experimental�data�of�spatial�and�temporal�masking�phenomenon�of�HVS.�

After� realizing� the�model� reflects� the�properties�of�human�visual�perception�accurately,� the�

maximum� bit� rate� savings� for� image� coding� by� exploiting� the� properties� of� HVS� is�

investigated.�

� Spatial masking�refers�to�the�masking�at�spatial�luminance�edges.��The�configuration�

and� results�of� the�conducted�visual�experiments� for�analyzing�spatial�masking�are�given� in�

Figure�2.10�[21].�The�variance�of�a�narrow�bar�of�white�noise�is�increased�until�the�noise�is�

visible�to�the�subjects.�The�variance�for�which�the�noise�becomes�visible�to�subjects�is�called�

visibility threshold.�The�visibility� threshold� is�plotted�as�a�function�of� the�distance�between�

the� noise� bar� and� spatial� edges.� The� visibility� threshold� becomes� higher,� especially� at� the�

dark�side�of�the�edge�and�somewhat�at�the�bright�side�of�the�edge.�In�terms�of�image�coding,�

this� phenomenon� brings� the� idea� that� the� image� regions� near� to� a� spatial� edge� can� be�

quantized� with� coarser� levels� to� decrease� the� bit� rate� [23].� The� corresponding� idea� in� the�

image-watermarking� domain� is� to� embed� stronger� watermark� into� the� regions� near� to� a�

spatial�edge�in�order�to�increase�the�robustness�of�the�watermark.�The�idea�is�used�in�[8,9].��

Temporal� masking� refers� to� the� masking� at� temporal� luminance� discontinuities.� In�

the�corresponding�experiments,�a�noise�flash�of�40�ms�duration�is�superimposed�to�a�spatially�

uniform�field�[21].�Then,�the�luminance�of�the�uniform�filed�is�suddenly�changed�from�bright�

to�dark�or�dark� to�bright.�Visibility� thresholds�become�higher�both�after�dark� to�bright�and�

bright� to� dark� transition� for� about� 100� ms.� The� results� for� these� experiments� and� the�

predictions�of�the�proposed�model�are�illustrated�in�Figure�2.11.�

��

� 24�

Figure�2.10.��Visibility�thresholds�for�a�narrow�bar�of�white�noise�in�the�neighborhood�of�a�

spatial�edge.�[22]�

� 25�

(a)�

(b)�

Figure�2.11.��Visibility�thresholds�for�a�40�ms.�flash�of�dynamic�white�noise�after�a�temporal�

brightness�jump�(a)� from�I=50�to�I=180,�(b)�from�I=180�to�I=50.�∆�are�visibility�thresholds�

measured�by�visual�experiments,� the�solid� lines�are�the�predictions�of�the�Girod’s�proposed�

model.�[21]��(I�is�the�intensity�level.)���������������

� 26�

2.3�Foveation�

The�human�retina,�which�is�the�inner�layer�of�the�eye�(Figure�2.12),�is�the�sensory�part�of�the�

human� eye.� It� mainly� consists� of� light-sensitive� receptor� cells,� ganglion� cells� and� bipolar�

cells.�The�light-sensitive�receptor�cells�are�of�two�kinds,�the�rods�and�cones.�Rods�are�very�

sensitive� to� light� and� provide� low� light� vision.� Cones� have� low� sensitivity� to� light� and�

provide�day�light�vision.�There�are�three�types�of�cones�in�the�human�retina:�the�cones�that�

absorbs� long�wavelength� light� (red),�middle�wavelength� light� (green)�and�short�wavelength�

light�(blue),�respectively.�They�enable�us�to�see�colors.�The�ganglion�and�bipolar�cells�form�a�

path� from� rods� and� cones� to� brain.� The� image� signal� that� is� sensed� by� rods� and� cones� is�

transmitted�via�this�path�to�the�brain�[24,25].��

Figure�2.12�Anatomy�of�the�human�eye�[22]�

The� density� distribution� of� light-sensitive� receptor� cells� and� ganglion� cells� is�

illustrated� in� Figure� 2.13� as� a� function� of� eccentricity,� where� 0� degree� correspond� to� the�

fovea�and� the�eccentricity� increases�as� the�distance�of� the�cells� to� the� fovea� increases� (see�

Figure�2.12).�The�density�of�cones�and�ganglion�cells�is�maximized�in�the�small�region�just�

opposite� to� lens.� Most� of� the� three�million� cones� in� each� retina� are� confined� to� this� small�

region�called�the�fovea�[24,25].�While�the�density�is�highest�at�the�fovea,�it�decreases�rapidly�

with� increasing� eccentricity.� � The� characteristics� of� density� distribution� directly� determine�

the�spatial�resolution�or�sampling�density�of�HVS�[26].�The�sampling�density�is�maximum�at�

� 27�

the� fovea� and� decreases� rapidly� with� increasing� eccentricity.� As� a� result� of� this� fact,� our�

sharpest�and�colorful� images�are�confined�to�a�small�area�of�view.�The�region�of�the�image�

that� is� projected� to� the� fovea� is� perceived� clearly,� while� the� other� parts� of� the� image� are�

perceived�as�a�bit�blurred.�In�Figure�2.14,�an�original�and�the�foveated�versions�of�the�Lena�

image�are�illustrated.� If�a�human�observer�gazes�to�the�center�of�the�Lena� image�(foveation�

point)�then�the�foveated�and�original�image�are�perceptually�equal.���

� �

Figure�2.13�Rods,�cones�and�ganglion�cells�density�as�a�function�of�eccentricity.�The�density�

of�cones�and�ganglion�cells�are�maximum�at�zero�eccentricity�that�corresponds�to�fovea�[26].�

��� �

����������������������(a)�� � � � � (b)�

Figure�2.14�Original�Lena�image�(a),�and�its�foveated�version,�(b).�

� 28�

� The� contrast� sensitivity� phenomenon� of� HVS� is� explained� in� the� previous� section.�

The� experiments� conducted� for� the� purpose� of� determining� contrast� thresholds� for� each�

spatial�frequency�is�also�presented�in�the�Section�2.3.�Similar�experiments�are�also�conducted�

to�determine�the�contrast�sensitivity�of�HVS�as�a�function�of�spatial�frequency�and�one�more�

variable,�eccentricity�[27].�The�configuration�for�the�experiments�is�illustrated�in�Figure�2.15�

Briefly,�the�subjects�are�asked�whether�they�sense�the�contrast�for�a�specific�spatial�frequency�

and�eccentricity.�If�the�answer�is�no,�the�contrast�of�the�target�is�increased.�By�this�process,�

the� contrast� thresholds� of� HVS� as� a� function� of� spatial� frequency� and� eccentricity� are�

determined.�The�experiments�are�made�by�Robson�&�Graham�(1981).�The�experimental�data�

is�modeled�in�[27]�with�the�following�equation,�

)5.2(����������������)..exp(),(2

2

e

eefCTefCT o

+= α �

where� f �is�the�spatial�frequency�(cycles�per�degree),� e �is�the�retinal�eccentricity�(degrees),�

oCT �is�the�minimum�contrast�threshold,�α �is�the�spatial�frequency�decay�constant�and�

2e �is�

the�half-resolution�eccentricity.�The�fit�of�the�model�to�the�experimental�data�is�illustrated�in�

Figure�2.16.�The�best�fitting�parameters�for�the�data�are:� 106.0=α ,� 3.22

=e ,� 64/1=o

CT .�

The�contrast�sensitivity,� ),( efCS ,�is�defined�as�the�inverse�of�the�contrast�thresholds.���

� The�foveation�phenomenon�of�HVS�is�used�for�image�and�video�coding�purposes�in�

a� number� of� studies.� In� [27],� a� foveated� multiresolution� pyramid� video� coder/decoder� is�

developed.� Their� proposed� system� uses� a� foveated� multiresolution� pyramid� to� code� each�

image� into� 5� or� 6� regions� varying� resolution.� � After� eliminating� the� spatial� edge� artifacts�

between� the� regions� created� by� the� foveation,� each� level� of� the� pyramid� is� motion�

compensated,� multiresolution� pyramid� coded� and� thresholded/quantized� with� respect� to�

contrast�thresholds�as�a�function�of�spatial� frequency�and�retinal�eccentricity.� �They�end�up�

with� the�zero-tree� coding�of� the�quantization� results.�They�used� laplacian� pyramid� for� the�

multiresolution�pyramid.��

� A�similar�approach�for�the�image�coding�is�given�in�[26].�This�case�wavelet pyramid�

is�used�instead�of�laplacian�pyramid.�The�image�is�decomposed�into�subband�levels�by�using�

orthogonal�filters.�Then�the�coefficients�of�each�subband�are�quantized�with�foveation�based�

contrast� sensitivity� for� each� subband.� � The� results� of� the� quantization� process� are� passed�

through�a�modified�SPIHT�coding�[28].��

� 29�

Figure�2.15.�The�configuration�for�the�experiments�to�determine�the�contrast�thresholds�of�

HVS�as�a�function�of�spatial�frequency,�f,�and�visual�angle�e(v,x).�

Figure� 2.16� Contrast� sensitivity� for� patches� of� sinusoidal� grating� as� function� of� retinal�

eccentricity� (degrees�of� visual� angle),� for� a� range�of� spatial� frequencies.�The�symbols�and�

connecting� dashed� lines� are� the� measurements;� the� solid� curves� are� the� predictions� of�

equation�(2.5)�[27].�

� 30�

While�computing�the�foveation�based�contrast�sensitivity�for�each�subband,�Wang�et�

al�[26]�firstly�take�the�effect�of�cut�off�frequency�into�the�formulation�of�contrast�sensitivity:��

� ( ) )6.2(�����������������������������������)(for����������������������������0

)(for���,,

),(�..0461.0

>

≤=−

xff

xffexfvS

m

mxvef

f�

where�x� is� the�pixel� location,�v�denotes� the�viewing�distance,� f gives�the�spatial� frequency�

(cycles/degree), ),( xve is�the�retinal�eccentricity�(degree)�and� )(xfm �is�the�cutoff�frequency�

for�a�given�location�x (Figure�2.15).�Above�this�cutoff�frequency,�it�is�not�possible�to�see�any�

higher�frequency�components.��

The� cutoff� frequency� is� determined� by� two� facts.� The� first� one� is� the� critical�

frequency�where�the�contrast�threshold�is�1�for�a�specific�visual�angle.�The�discussion�about�

the�critical���frequency�is�illustrated�in�Figure�2.17.�The�spatial�frequency�of�the�visual�target�

is� increased�until�contrast� threshold� is�1� for�a�specific�visual�angle.�Then,� this� frequency� is�

the�critical� frequency�for� that�specific�visual�angle, ),( xve .�The�second�factor,�which� limits�

the� cutoff� frequency,� is� the� display� resolution,� r .� Because� of� the� sampling� theorem,� the�

highest� frequency� that� can� be� represented� without� aliasing� by� the� display� is� half� of� the�

display�resolution.�By�combining�these�two�constraints,�the�cut�off�frequency�is�expressed�as:��

),min()(dcm ffxf = �

where� cf �is�the�critical�frequency�and� df �is�half�of�the�display�resolution,� r .��

� The�contrast�sensitivity�function�based�on�foveation,� ),,( xfvSf ,�can�be�adapted�to�

each� subband� of� DWT� domain.� Figure� 2.18� illustrates� the� contrast� sensitivity� function�

adapted�to�each�subband�of�wavelet�transform�[26].�

� The�contrast�sensitivity�function�based�on�foveation�is�used�for�the�purpose�of�image�

coding.� Similarly,� this� function� can� also� be� used� for� the� purpose� of� image� watermarking.�

Since�HVS�cannot�see�clearly�the�periphery�regions�while�it�gazes�to�a�point�in�the�image,�the�

strength�of�the�watermark�embedded�to�that�regions�can�be�higher�with�respect�to�strength�of�

the� ones� embedded� to� foveated� regions.� This� is� fundamental� idea� behind� the� proposed�

method�for�the�watermarking�of�the�images.�In�Chapter�4,�the�details�of�our�proposed�method�

based�on�foveation�are�given.���

� 31�

Figure�2.17�The�configuration�for�the�experiments�to�determine�the�critical�frequency�for�a�

specific�visual�angle�of�e(v,x).�The�spatial�frequency�of�the�target�is�increased�until�the�

contrast�threshold�is�1.�

� 32�

�������� �

� � ��������(a)�� � � � � �������(b)�

Figure� 2.18� � (a)� Discrete� wavelet� transform� structure.� (b)� Illustration� of� corresponding�

foveation�based�contrast�sensitivity�function�to�each�subband.�Brightness�shows�the�strength�

of�the�sensitivity.�[26]�

� 33�

2.4�Temporal�Sensitivity��

�Temporal sensitivity refers� to� the� sensitivity� of� HVS� to� temporal� fluctuations� in� a� spatial�

pattern.�These�temporal� fluctuations�can�be�so�slow,�such�as�a�growth�of�a�plant�or�so�fast,�

like�the�rapid�fluctuations�in�the�intensity�level�of�an�electric�lamp�in�a�room.�Both�of�the�two�

examples�give�some�insight�about�the�characteristics�of�temporal�sensitivity�of�HVS�that�will�

be�examined�in�this�section.��

In�a�more�formal�manner,�temporal�sensitivity�refers�to�the�influence�of�the�temporal�

dimension�of�light�(stimulus�for�vision)�to�the�perception�of�HVS.�It�is�not�only�dependent�to�

temporal�configuration�of�the�visual�stimuli,�but�also�dependent�on�the�spatial�configuration�

of� the� target,� size� of� the� target,� background� luminance� and� surround� luminance�

[29,30,31,32].� Kelly� [30]� examined� the� effects� of� the� size� of� the� target� on� temporal�

sensitivity�and�also�conducted�visual�experiments�on�the�effects�of�the�presence�of�edges�in�

the� spatial� pattern� on� temporal� sensitivity� of� HVS.� The� effects� of� the� luminance� of� the�

surround�and�the�effects�of�spatial�frequency�of�the�visual�target�on�temporal�sensitivity�were�

studied�by�Roufs�[31]�and�Robson�[32],�respectively.��A�detailed�overview�of�the�influences�

of� the�above�factors�was�given�by�Watson�[29].�A�model� is�also�proposed�for� the�temporal�

sensitivity� and� comparisons�between� the� visual� experiments�data�and� the�model� prediction�

are�achieved.�(One�can�refer�to�[29]�for�a�detailed�explanation�of�the�proposed�model.)�

In�this�section,�a�brief�summary�of�the�Watson’s�research�[29]�on�this�topic�is�given.��

First� of� all,� some� basic� notations� are� given� for� the� visual� stimulus� that� is� distributed� over�

space� and� time.� Then,� the� definition� of� contrast� for� the� three� dimensional� visual� stimulus�

(two�for�space,�one�for�time)�is�given�and�the�assumptions�about�the�contrast�distribution�in�

the�laboratory�environment�are�stated.�The�next�step�presents�how�the�visual�experiments�are�

conducted�in�order�to�find�the�Temporal�Contrast�Sensitivity�Function�(TCSF)�of�HVS.��This�

step�also�gives� the�effects� of� changes� in� the�background� luminance,� size�of� the� target�and�

spatial�configuration�of�the�visual�target�on�TCSF.�Since�most�of�the�image�and�video�coding�

standards�are�based�on�block-DCT�methods,�the�effects�of�spatial�configuration�of�the�visual�

target�on�TCSF�are�of�great�importance.��Therefore,�in�the�last�part,�a�review�of�a�recent�work�

[33]�about�how�the�TCSF�changes�for�a�spatial�grating�of�a�specific�DCT�frequency�is�given.�

This�part�is�especially�important,�since�it�forms�the�basis�of�the�proposed�method�on�temporal�

watermarking�of�digital�videos,�given�in�Chapter�5.������������

� 34�

2.4.1�Fundamental�Definitions�

The�stimuli�for�the�vision�can�be�modeled�as�a�three�dimensional�function,� ),,( tyxI ,�where�

x �and� y �are�spatial�horizontal�and�vertical�directions,�respectively�and� t �denotes�time.�The�

background�intensity�is�notated�as� BI �and�surround�intensity�is�notated�as� SI .�The�surround�

intensity, SI ,� is�usually�set�equal� to�background� intensity, BI .�Various�definitions�of� BI are�

possible,� i.e.� the�space-average� intensity�of� the� image,� the�unvarying� level�upon�which� the�

target� is�superposed�and� the�space-time�average�of� the� image.�The� intensity�distribution�of�

the�target�is�designated�as� ),,( tyxIT

.�It�is�equal�to�the�difference�of�the�overall�distribution,�

),,( tyxI ,�from�the�background�intensity,� BI .�Definitions�are�illustrated�in�Figure�2.19.�

Figure�2.19�Some�terms�used�to�describe�visual�stimuli:�(a)�The�spatial�configuration�of�the�

image.�The�target�and�background�are�superposed�on�some�specified�area,�shown�here�as�a�

disk.� The� surround� lies� outside� the� target� and� background.� (b)� A� horizontal� cross� section�

through�the�intensity�distribution� ),,( tyxI of�the�image.�The�surround�has�intensity� SI ,�the�

background,� BI and�the�target� ),,( tyxIT

.�Target�contrast�is�the�ratio�B

T

I

I.�[29]�

� 35�

In�Section�2.1.2,�the�definitions�of�contrast�for�a�two�dimensional�target�(image)�are�given.�In�

a�similar�way,�contrast� for�a�three�dimensional�visual� target�(video)�can�also�defined�as�the�

ratio�of�the�target�intensity�to�the�background�intensity,�

B

T

I

tyxItyxC

),,(),,( = �������������������������������������������������������(2.7)�

By�using�(2.6),�the�overall�intensity�can�be�written�as,�

)),,(1(),,(),,( tyxCItyxIItyxI BTB +=+= ������������������������������������(2.8)�

According�to�these�formulations,�the�stimulus�is�a�function�of�background�intensity,� BI �and�

contrast� distribution� ),,( tyxC .� The� reason� for� such� a� separation� of� the� signal� into�

background�and�contrast� terms� is� the�more� invariant� character�of� temporal� sensitivity�with�

respect�to�contrast�than�with�respect�to�intensity.�

In�many�experimental�situations,�the�contrast�distribution�is�separable,�i.e.,��

)().,(),,( tCyxCtyxC = ����������������������������������������������������(2.9)�

This�seperability�means�that�spatial�contrast�distribution, ),( yxC ,�is�invariant�with�respect�to�

time�and�temporal�distribution� is�same�at�all�points� in� the� image�[29].�Since�the�aim�of� the�

experiments� is� to� investigate� the�effects�of� the� temporal�dimension�of� the�visual�stimuli�on�

the�perception�of�HVS,� )(tC � is�used�as�a�visual�stimuli�during� the�visual�experiments�and�

),( yxC �is�normalized�to�have�an�overall�contrast�of�1.��

2.4.2�Temporal�Contrast�Sensitivity�Function��

In� Figure� 2.19,� the� configuration� of� the� visual� experiment� to� find� the� temporal� contrast�

sensitivity� function� is� illustrated.� The� visual� target� (Figure� 2.20(a))� is� modulated� with� a�

sinusoidal�function,� )(tc � in�Figure�2.20(b),�and�presented�to�a�subject�standing�at�a�specific�

distance.�The�subject�is�asked�whether�the�modulated�function�target�is�distinguishable�from�

a� target�with� zero� contrast.� When� the�answer� is� negative,� the�amplitude�of� the� sinusoid� is�

increased.�The�process�is�repeated�until�the�temporal�fluctuations�in�the�visual�target�become�

� 36�

visible.�The�threshold�of� the�sinusoid�at�which�the�target�become�visible� is�called� temporal�

contrast threshold� and� its� reciprocal� is� called� temporal� contrast sensitivity.� � The� same�

experiment� is� conducted� for� each� temporal� frequency.� In� this� manner,� temporal� contrast�

sensitivity� function� (TSCF)� is� determined� which� gives� the� contrast� sensitivity� against�

temporal�frequency.�

� � � � � � � � � � (a)� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � (b) �

Figure�2.20���The�target�in�(a)�is�modulated�with�respect�to�the�temporal�contrast�function,�

C(t),�in�(b).�The�amplitude�of�C(t),�I,�is�increased�until�the�temporal�fluctuations�in�the�target�

become�visible�to�the�subject.���

� 37�

Figure� 2.21� Temporal� contrast� sensitivity� function� of� HVS� for� different� background�

luminances.�TCSF�peaks�around�5-10�Hz.�As�the�background�luminance�increases�a�shift�to�

higher�temporal�frequencies�occurs.�[22]��

� TCSF� is� illustrated� in� Figure� 2.21.� � As� noted� previously,� the� size� of� the� target,�

background� luminance� and� spatial� configuration� of� the� target� affect� the� characteristics� of�

TCSF.� � Visual� experiments� show� that� an� increase� in� the� size� of� the� target� decreases� the�

sensitivity�at� low� temporal� frequencies,�while�not�affecting� the�sensitivity�at�high� temporal�

frequencies� [29].�An� increase� in� the�background� luminance�causes�a�drop�at� low� temporal�

frequency� limb� of� TCSF.� It� also� shifts� TCSF� to� higher� temporal� frequencies� [29].� A�

modification� in� the�spatial�configuration�of� the�visual� target� (Figure�2.22)�does�not�change�

the� high� frequency� limb� of� TCSF.� However,� the� presence� of� the� edges� or� high� spatial�

frequencies� in� the� target�raises�the� low�frequency� limb�of�TCSF.�Figure�2.23� illustrates�the�

effects�of�the�spatial�configuration�of�the�target�on�TCSF.���

� 38�

���������� �

(a)� (b)���

Figure�2.22�The�spatial�configurations�of�the�two�different�targets.�The�fundamental�spatial�

frequency� of� the� target� (a)� is� two� times� of� the� target� (b).� Both� of� the� two� targets� are�

modulated�with�C(t),�Figure�2.19�(b).�The�measured�TCSF�will�be�different�for�each�of�the�

visual�target.����������

�������

Figure�2.23�The�effect�of�spatial�frequency�upon�temporal�contrast�sensitivity�function.�The�

target� was� a� sinusoidal� grating� with� a� spatial� frequency� of� 0.5,� 4,� 16� or� 22cycle/degree–1�

Background�luminance�was�20�cd.m-2.�Target�was�2.5�x�2.5o�and�surround�was�10�x�10o.�The�

subjects�are�2�m.�away�from�the�visual�target.�[32]�����

� 39�

2.4.3�Temporal�Contrast�Thresholds�for �spatial�DCT�frequencies.��

In�image�and�video�processing,�most�of�the�compression�standards�are�based�on�block-DCT�

methods.�The�visibility�of�the�quantization�noise�in�the�DCT�domain�as�a�result�of�coding�of�

the�images�or�videos�is�of�great�concern,�since�it�affects�the�quality�of�the�image�or�video.�In�

order� to� achieve� minimum� bit� rate� with� an� acceptable� image� quality,� the� maximum�

quantization� level� that� yields� imperceptible�quantization�noise� for�human�observers�should�

be�determined.�In�[18],�optimum�quantization�levels�in�DCT�domain�for�a�given�bit�rate�are�

derived�by�a�means�of�visual�experiments�for�an�individual�image.��

� The�quantization�error�resulted�from�the�coding�of� the� images�is�a�two�dimensional�

quantity.� However,� unlike� the� images,� the� quantization� error� resulted� from� the� coding� of�

video� is� a� three� dimensional� quantity,� with� one� more� dimension,� which� is� time.� This�

quantization�error�is�called�dynamic quantization error [33].�

� The�visibility�of�the�quantization�error�as�a�result�of�the�DCT-based�coding�of�videos�

is�studied�in�[33].�The�maximum�level�of�the�dynamic�quantization�error�that�is�not�sensible�

to� HVS� is� measured.� This� maximum� level� of� dynamic� quantization� noise� is� simply� the�

temporal�contrast�threshold.��

� The� temporal�contrast� thresholds� for� the�spatial�DCT� frequencies�of� � { 0,0} ,� { 0,1} ,��

{ 0,2} ,�{ 0,3} ,�{ 0,5} ,�{ 0,7} ,�{ 1,1} ,�{ 2,2} ,�{ 3,3} ,�{ 5,5} ,�{ 7,7} �and�temporal�frequencies�of�0,�1,�

2,�4,�6,�10,�12,�15,�30�Hz.�are�measured�[33].�Figure�2.24�illustrates�the�results�for�the�spatial�

DCT� frequencies� of� { 0,0} ,� { 0,7} � and� { 3,3} .� An� increase� in� threshold� at� high� spatial� and�

temporal� frequencies� can� be� observed� easily.� The� data� in� Figure� 2.24� shows� a� low� pass�

characteristic�roughly�at�low�spatial�and�temporal�frequencies.��

� All� these� spatiotemporal� data� can� be� modeled� as� a� product� of� a� temporal�

function, )(wTw

,�a�spatial�function,� ),( vuTf

�and�an�orientation�function,� ),( vuTa

.�

),().,().(.),,( vuTvuTwTTwvuTafwo

= ��������������������������������������(2.10)�

where�o

T is�a�global�or�minimum� threshold.� )(wTw

, ),( vuTf

�and� ),( vuTa

are� illustrated� in�

Figure�2.25.�

� In� [33],� all� the� visual� experiments� to� measure� temporal� contrast� thresholds� are�

conducted�for�a�specific�purpose�of�defining�a�new�digital�video�quality�metric.�The�aim�of�

� 40�

such�a�metric� is�to�evaluate�visual�quality�of�digital�video.�Since�the�metric�is�based�on�the�

basics�of�HVS,�the�metric�gives�more�reliable�prediction�about�the�visual�quality�of�the�video�

when�the�observer�is�a�human.��

� In�Chapter�5,�the�temporal�contrast�thresholds�are�used�for�a�different�purpose.�The�

temporal� contrast� thresholds� are� exploited� to� determine� the� place� and� strength� of� the�

watermark�that�is�embedded�to�digital�video.����

Figure�2.24�Temporal�contrast�thresholds�for�spatial�DCT�frequencies�of�{ 0,0} ,�{ 0,7} �and�

{ 3,3} .��Points�are�data�of�two�observers.�The�thicker�curve�is�the�model.[33]�

���������������������������(a)���������������������������������������������������(b)�������������������������������������������(c)�

Figure�2.25�Temporal� (a),�spatial� (b)�and�orientation� (c)�components�of� the�dynamic�DCT�

threshold�model.�[33]�

� 41�

CHAPTER�3���

��

WATERMARKING�BASED�ON�VISUAL�MODELS������This�chapter�presents� the�basic�watermarking�methods� in� the� literature,�which�are�based�on�

perceptual�models�of�Human�Visual�System.�As�noted�in�Chapter�2,�the�models�are�derived�

by� means� of� physco-visual� experiments.� Specifically,� most� of� the� methods� presented� here�

use�the�contrast�thresholds�that�are�the�measure�of�the�sensitivity�of�HVS�for�different�spatial�

frequencies.�By�exploiting�some�characteristics�of�HVS�such�as�light�adaptation�and�contrast�

masking,� the� contrast� thresholds� are� forced� to� the� maximum� possible� level.� The� resulted�

levels� give� the� maximum� possible� watermark� strength� to� produce� visually� non-distorted�

watermarked�images.��

� In�the�first�section�of�this�chapter,�the�image�watermarking�methods�are�examined.�In�

the�second�part,�two�well-known�video�watermarking�methods�are�presented.��

3.1�Image�Watermarking�Methods�based�on�Visual�Models�

As� mentioned,� an� efficient� and� useful� watermarking� scheme� should� have� some� properties�

such�as�robustness,�capacity�and�imperceptibility�[1,2,10,11].�The�owner�of�the�image/video�

wants� to� prove� his/her� ownerships� as� long� as� the� quality� of� the� digital� content� remains.�

Hence,� the� watermark� should� be� detected� after� the� digital� content� passes� from� any� signal�

operation�that�does�not�distort�the�image�quality�considerably.�This�refers�to�robustness.�On�

the�other�hand,�the�capacity�is�directly�related�with�the�robustness.�It�refers�to�the�ability�of�

detecting� watermark� with� a� low� probability� of� error� as� the� number� of� differently�

watermarked�versions�of�an�image�or�video�increases.�Finally,�the�imperceptibility�refers�to�

visual�similarity�between�the�original�content�and�the�watermarked�content.�Obviously,�most�

of� the�owners�of�a�digital� content�do�not�want� to�any�kind�of�degradations� in� their�works.�

Therefore,� it� is� required� that� the�watermarked� image/video�have� the�same�visual�quality�as�

� 42�

the� original� one.� Due� to� the� mentioned� requirements,� any� researcher� working� on� digital�

watermarking�area�should�use�the�HVS�models,�which�are�mostly�developed�for� image�and�

video�coding�applications.��

� In� [34],� an� image� watermarking� method� embedding� the� watermark� by� employing�

multiresolution�fusion�technique�is�proposed.�The�method�incorporates�a�HVS�model,�which�

gives�the�contrast�sensitivity�for�a�particular�pair�of�spatial�frequencies,�as�

)1(.05.5),( )(1.0)(178.0 −= ++− vuvu eevuC ���������������������������������(3.1)�

),( vuC �is�the�contrast�sensitivity�matrix�and� u ,� v �are�the�spatial�frequencies,�given�in�units�

of� cycles� per� visual� angle.� Specifically,� in� this� method,� the� image� is� decomposed� into�

subbands�by�using�wavelet�transform�[34].�Each�subband�is�segmented�into�non-overlapping�

rectangles.�The�watermark�is�embedded�using�a�measure�called�saliency,�which�is�a�measure�

of� the� importance� of� an� image� component.� The� saliency� of� a� rectangular� segment� is�

computed� as� the� sum� of� the� product� of� the� contrast� sensitivity� function� and� square� of�

magnitude� of� the� Discrete� Fourier� transform� of� this� rectangular� segment.� This� gives� the�

measure� how� much� the� rectangular� segment� is� important� for� HVS.� If� the� importance� is�

greater,� then� the�presence�of� the�watermark�will� be� stronger� in� that� segment,� according� to�

proposed�method.�

� Another�approach�[9]�exploits�contrast�masking�and�spatial�masking�phenomenon�of�

HVS� to� guarantee� the� invisibility� of� embedded�watermark.�The� image� is�decomposed� into�

8x8�blocks�and�DCT�of�each�block�is�calculated.�A�visual�mask�is�computed�for�each�block.�

The�watermark� is�generated�scaling� the�visual�mask�and�multiplying� it�with� the�DCT�of�a�

maximal�length�pseudo-noise�sequence.�This�watermark�is�added�to�the�corresponding�DCT�

block.�Then,�the�inverse�DCT�of�each�block�into�which�watermark�is�embedded�is�computed.�

At�this�step,�spatial�masking�is�used�to�check�whether�the�watermark�is�invisible�and�control�

the�scaling�factor.�If�the�watermarking�causes�a�visible�distortion�in�the�image�block,�then�the�

scaling�factor�is�decreased�and�the�process�is�repeated.��

The�model�used�in�[9]�for�visual�mask�expresses�the�contrast�thresholds�as�a�function�

of� f ,�the�masking�frequency�m

f �and�contrast�m

c �(see�Section�2.1.2�for�the�experiments�to�

model�contrast�masking):��

��

� 43�

)�().(),( �])./([�,1 αmm cffkMaxfcffc

om= ������������������������������������(3.2)�

where� )( fco

is�detection�threshold�at�frequency� f .�The�detection�threshold�is�the�minimum�

amplitude�of�a�sinusoidal�grating�that� the�grating�can�be�discriminated�from�a�zero�contrast�

grating�(see�Section�2.1).� �In�the�case�of�8x8�DCT�transform�of�each�block,�each�frequency�

component� are� masked� by� itself� and� other� 63� spatial� frequencies.� Therefore,� a� summation�

rule� of� the� form� (3.3)� is� used� to� insert� the� affect� of� the� each� spatial� frequency� into� the�

calculation�of�contrast�threshold:�

ββ /1]),([)( ∑=m

f mffcfc �����������������������������������������������(3.3)�

where�the�value�of� β is�2.�If�the�contrast�error�between�the�original�and�watermarked�image�

is�smaller�that� )( fc ,�then�the�model�predicts�that�the�watermarked�block�is�visually�equal�to�

the�original.��

� The�spatial�model,�which� is�used� to�check� the� imperceptibility�of� the�watermark� is�

the� modified� version� of� the� Girod’s� w-model� [21].� The� w-model� predicts� spatial� and�

temporal� masking� affects� of� HVS� accurately.� � In� [23],� this� model� is� used� to� calculate� the�

tolerable error level�(TEL)�of�each�pixel�in�the�image.�If�the�resulted�error�due�to�the�image�

coding�is�larger�than�the�TEL�of�a�pixel,�the�degradation�becomes�visible�in�that�part�of�the�

image.� The� proposed� watermarking� method� [9]� also� uses� this� model� to� verify� that� the�

watermark�designed� in�DCT�domain�with� the�contrast�masking�model� is� invisible� for� local�

spatial�regions.�Each�watermark�coefficient�is�compared�with�TEL�to�assure�the�invisibility.�

If�the�watermark�coefficient�is�visible,�then�the�process�is�repeated�by�decreasing�the�scaling�

factor.��

� A� fundamental� approach� for� perceptual� watermarking� is� proposed� in� [10,11]� by�

Podilchuk,� et al.� Similar� to� � [9],� the� image� is� first� segmented� into� 8x8� non-overlapping�

blocks.� DCT� of� each� block� is� computed� and� DCT� coefficients� are� watermarked� by�

considering� just� noticeable� difference� (JND).� The� JND� is� simply� the� detection� thresholds�

measured�by�visual�experiments.�The�effects�of�luminance�masking�and�contrast�masking�are�

also� inserted� into� the� computation� of� detection� thresholds.� � The� watermarking� scheme� is�

formulated�as�follows:���

� 44�

>+=

�otherwise�����������������������������

�if���.

,,

,,,,,,,,,,*,,

bvu

bvubvubvubvubvubvu I

JNDIwJNDII �������������������������������(3.4)�

bvuI ,, is� the� DCT� coefficient� of� the� image� block� b ,� bvuJND ,, is� the� corresponding� JND�

matrix� of� the� corresponding� block� and� bvuw ,, is� the� watermark� sequence� that� is� generated�

from�zero�mean,�unit� variance�Gaussian�distribution.� In� (3.4),�only� the�coefficients�greater�

than� the�JND� levels�are�watermarked.�Since� the�coefficients� lower� than� the�JND� levels�are�

not� significant� for� HVS,� these� coefficients� are� most� probably� eliminated� after� a� possible�

compression� stage.� JND� levels� are� used� to� determine� the� strength� of� the� watermark.�

Otherwise,�the�distortion�resulted�due�to�the�watermarking�process�become�visible.�����

� Podilchuk�et al [10]�also�suggest�a�wavelet�based�watermarking�method�by�using�the�

visible� thresholds� for� each� subband� of� wavelet� transform� measured� by� means� of� visual�

experiments� (see�Table�2.1� for� the�detection� thresholds).�The�coefficients�of�each�subband�

are�watermarked�with�the�following�scheme:��

>+=

�otherwise�����������������������������

�if���.

,,,

,,,,,,,,,,,*,,,

flvu

flflvuflvuflflvuflvu I

JNDIwJNDII ������������������������(3.5)�

flvuI ,,, refers� to� the�wavelet�coefficient�at�position� ),( vu in�resolution� level� l and�frequency�

orientation� f ,� *,,, flvuI �refers�to�the�watermarked�wavelet�coefficient,� flvuw ,,, corresponds�to�

the� watermark� sequence� and� flJND , is� the� measured� just� noticeable� difference� (detection�

threshold)� for� the�subband�of�resolution� level� l and�frequency�orientation,� f.�The�reason�for�

using�such�a�watermarking�scheme�as� in�(3.5)� is� just�the�same�with�the�case�of�DCT�based�

watermarking�(3.4).�

� Podilchuk,�et al�[10]�also�make�a�comparison�between�the�DCT�based,�wavelet�based�

watermarking� methods� and� spread spectrum watermarking� method� [7].� One� of� the�

disadvantages� of� the� spread� spectrum� method� they� note� is� the� visible� distortions� in� the�

watermarked� image� for� the� case� in� which� the� original� image� contains� large� smooth� areas.�

However,� their� image�adaptive� two�different�perceptual�methods� they�proposed�give�better�

visual�results�since�the�perceptual�watermarks�adapt�on�the�local�regions�of�the�image.��

� 45�

� Another�novel� image�watermarking�approach� is�proposed� in� [35].�Kutter�et�al� [35]�

first�presented�the�inability�of�Weber�and�Michelson�contrast�(see�section�2.1)�to�measure�the�

contrast�of�the�natural� images.�If�one�of�these�contrast�definitions�is�used�in�natural�images,�

then�a�few�very�bright�or�dark�points�would�determine�the�contrast�of�the�whole�image.�They�

define� a� new� contrast� called� isotropic local contrast� that� is� based� on� the� Peli’s� contrast�

definition�[15]�using�the�directional�analytic�filters.�Then,�the�contrast�masking�phenomenon�

of�HVS�is�modeled�according�to�isotropic�local�contrast�by�means�of�visual�experiments�and�

the� weight� of� the� watermark� is� adjusted� according� to� this� contrast� masking� model� in� the�

watermarking�insertion�process.��

3.2�Video�Watermarking�Methods�based�on�Visual�Models�

In� contrast� to� image� watermarking,� which� is� based� on� visual� models,� the� perceptual�

watermarking�of�video�has�not�been�studied�in�detail�for�watermarking�research.�One�of�the�

reasons�of�this�fact� is�the�complexity�and�difficulty�in�modeling�of�temporal�sensitivity�and�

temporal�masking�of�HVS.�The�modeling�of�temporal�sensitivity�and�masking�phenomena�of�

HVS�is�still�an�open�research�area�[11].�

� The�video�watermarking�presents�some�other�potential�attacks�different�than�the�ones�

in�the�image�and�audio�case.�The�large�amount�of�video�data�and�high�similarity�of�frames�in�

a� scene� create� a� vulnerable� condition� for� the� attacks� such� as� frame� averaging,� frame�

dropping,�collusion�etc.�Any�method�should�be�able�to�survive�such�attacks.��

� One�simple�approach�for�video�watermarking�is�to�watermark�each�frame�of�video�as�

an� independent� image.� However,� this� case� does� not� solve� the� problem,� especially� for� the�

averaging�attack.�For�such�a�case,�an�attacker�may�average�no�motion�or�slow�motion�regions�

of� the� video� to� remove� the� watermark.� In� addition,� the� method� can� produce� visible�

distortions�in�the�watermarked�video,�since�it�is�not�based�on�the�temporal�characteristics�of�

HVS.�Moreover,�the�difference�between�the�two�consecutive�frames,�which�are�watermarked�

independently,� can� be� visible,� if� the� temporal� characteristics� of� HVS� are� not� taken� into�

account�during�the�watermark�insertion�process.��

� Unlike� the� first� case,�another�method�can�be�watermarking�of�each� frame�with� the�

same� watermark.� However,� this� method� also� poses� problems� for� the� collusion� attack.� An�

attacker� can� use� all� the� frames� in� the� video� in� order� to� detect� and� remove� that� fixed�

watermark.� Furthermore,� such� a� watermarking� process� will� be� video� independent� and� the�

� 46�

invisibility�of�watermark�is�not�guaranteed�for�each�video�since�the�method�is�not�based�on�

temporal�characteristics�of�HVS,�just�like�the�first�approach.��

� Podilchuk,� et al [10]� proposed� a� method� that� achieves� a� trade� off� between�

independent�watermarking�each�frame�and�utilization�of�the�same�watermark�for�each�frame.�

The�method�embeds�a�watermark�each�intra�(I)�frame�in�an�MPEG�sequence�using�the�DCT�

based� perceptual� image� watermarking� method� [10]� and� then,� apply� a� simple� linear�

interpolation� of� the� watermarks� to� every� frame� between� two� consecutive� I� frames.� If� the�

interpolation�is�not�achieved,�a�visible�distortion�is�perceived�at�each�I�frame,�while�watching�

the� video.� The� interpolation� decreases� the� visual� distortion� between� the� frames� resulted�

because� of� watermarking.� In� principle,� the� difference� between� two� consecutive� frames�

should� be� such� that� it� should� not� yield� a� distortion� greater� than� the� temporal� contrast�

thresholds�in�the�temporal�frequency�domain.��

� In�another�study� [36],�a�clever�method� is�proposed� to�solve� the�above�problems�of�

the�marking�each�frame�independently�and�using�a�fixed�watermark�for�the�entire�video.�The�

proposed�method�is�shot�based.�In�other�words,�the�video�is�separated�into�shots�and�temporal�

wavelet� transform� of� each� shot� is� computed.� A� different� watermark� is� embedded� to� each�

wavelet� coefficient� frame� by� exploiting� the� contrast� masking� and� spatial� masking�

characteristics� of� HVS.� While� the� watermarks� embedded� into� low� pass� frames� exist�

throughout�the�entire�scene,� the�watermarks�embedded�into�high�pass�frames�corresponding�

to�fast�motion�regions�of�the�video�are�highly�localized�in�time.�Such�a�watermarking�scheme�

solves� the� mentioned� problems� above.� For� example,� averaging� no� motion� or� slow� motion�

regions�of�the�video�only�distorts�the�watermark�embedded�into�high�pass�frames.�Hence,�the�

watermarks� that� are� embedded� into� low� pass� frames� survive� such� attacks.� In� addition,� the�

method�also�solves�the�collusion�attack�since�there�is�no�such�a�case�that�a�fixed�watermark�is�

embedded�to�each�frame�of�the�video.��

� One�alternative�method�of�video�watermarking�is�proposed�in�Chapter�5.�We�directly�

use� temporal� contrast� thresholds� in� our� scheme�and�show� the� robustness�of� the�method� to�

attacks� such� as� additive� Gaussian� noise,� ITU� H263+� coding,� frame� dropping� and� frame�

averaging.�

���

� 47�

CHAPTER�4��

FOVEATED�IMAGE�WATERMARKING�

The�spatial� resolution�of� the�human�visual�system�(HVS)�decreases� rapidly�away� from� the�

point� of� fixation� (foveation� point).� � By� exploiting� this� fact,� a� watermarking� approach� that�

embeds� the�watermark�energy� into� the� image�peripheral�according�to� foveation-based�HVS�

contrast�thresholds�is�presented�in�this�chapter.��

4.1 Introduction

As� already� mentioned� previous� chapters,� the� requirements� for� an� effective� watermark� are�

imperceptibility,� robustness� to� any� signal� processing� and� intended� signal� distortions,� and�

capacity� that� refers� to� the� ability� of� detecting� the� watermark� among� different� watermarks����

with�a�low�probability�of�error.�There�is�an�obvious�trade-off�between�these�requirements;�a�

gain� from� imperceptibility�will� likely� to�be� lost� from�capacity�or� robustness,�or�vice�versa.�

The�imperceptibility�criterion�is�directly�and�the�other�two�are�indirectly�related�with�human�

visual�system (HVS).�Hence,�the�researchers�working�on�digital�watermarking�usually�utilize�

visual�models,�which�are�developed�in�the�context�of�image�coding�[3-5].�

� � In� this� chapter,� we� utilize� the� foveation� phenomenon� of� HVS.� We� first� revise� the�

fundamentals�of�foveation�that�is�developed�in�the�concept�of�foveated�image�coding�[26,�27]�

and� then� propose� a� watermarking� scheme� that� exploits� this� phenomenon.� (The� basics� of�

foveation� are� given� in� Section� 2.3� in� detailed.)� We� then� quantify� the� robustness� of� the�

algorithm� against� some� typical� attacks� by� some� simulations.� A� well-known� HVS-based�

method�[10]� is�also�compared�with�our�method.� In�addition,� the�method�is�also�adapted�for�

video.�The�robustness�of�the�method�against�ITU�H263+�coding�is�tested�and�the�method�is�

compared�with�another�HVS�based�video�watermarking�method�[11].��

� 48�

4.2�Foveation��

As� stated� in� Section� 2.3,� contrast� sensitivity� of� HVS� is� not� uniform� with� respect� to� pixel�

locations.�The�sensitivity� is�maximum�at� the�point�of�gaze� (foveation�point)�and�decreases�

rapidly�while� the�distance� to� foveation�point�gets� larger� (Figure�4.1).�This�phenomenon�of�

HVS�is�recently�modeled�in�[27]�by�using�psychovisual�experimental�data.�For�compression�

purposes,� Wang� and� Bovik� [26]� improved� this� model� by� taking� the� cutoff� frequency� into�

account�and�define�the�contrast�sensitivity, fS as:��

( ) )1.4(������������������������������������)(for����������������������������0

)(for���,,

),(�..0461.0

>

≤=−

xff

xffexfvS

m

mxvef

f�

where�x� is� the�pixel� location,�v�denotes� the�viewing�distance,� f gives�the�spatial� frequency�

(cycles/degree), ),( xve is� the�retinal�eccentricity� (degree)� that� refer� to� the�visual�angle,�Θ,�

which� is� shown� in� Figure� 4.1� and� )(xfm � is� the� cutoff� frequency� for� a� given� location� x.�

Above�this�cutoff�frequency,�it�has�already�mentioned�that�it�is�not�possible�to�resolve�higher�

frequencies� from�each�other.� �The�model� in�(4.1)�can�also�be�adapted�to�subbands�of�DWT�

domain�by�using�the�following�equation�[26]:�

(4.2)�����������������������������B�for��x���))(,2.,(v),,( ,, ΦΦ− ∈= λλ

λ xdrSxfvS ff �

Figure�4.1�Typical��Geometry�[26]�

Foveation�

point�

Image�plane�

fovea�Θ�

retina�

x=(x1,�x2)�

v�

u�

� 49�

where� r �gives�the�display�resolution,�λ is�the�decomposition�level�of�the�wavelet�transform,�

)(, xd Φλ � is� the� equivalent� distance� of� a� wavelet� coefficient� from� the� foveation� point� at�

position� �,B�x� Φ∈ λ in�the�spatial�domain�and� Φ,Bλ is�the�set�of�wavelet�coefficient�positions�

residing� in� subband� (λ,Φ)� [26].� The� resulting� contrast� sensitivity� is� used� to� determine�

quantization�levels,�which�yield�imperceptible�quantization�error�based�on�HVS.���

4.3�Proposed�Watermarking�Method��

Let� remember� that�an� important�principle�of�watermarking� is� to�embed� the�watermark� into�

perceptually�significant�portion�of� the� image,�so� that� the� resulting� image� is�more� robust� to�

attacks�[7].�Since�perceptually�significant�part�of�the�image�is�the�region�around�the�foveation�

point,� the� watermark� should� be� embedded� mostly� into� this� part.� On� the� other� hand,� the�

strength�of�the�watermark�in�the�periphery�region�can�be�selected�higher�with�respect�to�the�

foveated�region,�since�the�contrast�threshold�levels,�which�can�be�noticed�by�HVS�are�higher�

in�those�regions.�It�will�be�shown�that�these�two�requirements�can�be�satisfied�with�a�method.��

In�the�proposed�method,�we�consider�a�similar�approach�to�conventional�HVS-based�

image�watermarking� [10,11].�Φ,λ

T ,� contrast� threshold�value� [20]� for� the�subband� level,�λ,�

and�orientation,�Φ,�in�the�DWT�domain,�is�an�important�parameter�for�these�methods,�which�

defines� the� frequency� sensitivity� of� HVS� in� different� subbands.�Φ,λ

T � is� obtained� by�

subjective�experiments�and�should�be�weighted�based�on�foveation.�Hence,�we�first�define�a

contrast threshold weight function, ),,( xfvTf ,�by�using�the�sensitivity�function� fS ,�in�(4.1):��

(4.3)�����������������������)(���;����)(for���������),,(/1

)(for���������),,(/1),,(

=>

≤=

fxfxffxfvS

xffxfvSxfvT

mmf

mf

f �

Note� that� ),,( xfvTf � is� equal� to� 1� at� the� foveation� point� and� its� value� gets� larger� as� the�

distance� from� the� foveation� point� increases.� It� reaches� to� its� maximum� at� x� = x � and� is�

assumed�to�remain�constant�after�that�point.�In�Figure�4.2,� ),,( xfvTf �is�illustrated�with�dark�

regions�showing�low�threshold�values.��

� 50�

Figure�4.2�Contrast�threshold�weight�function��(circle�center�is�the�foveation�point;�threshold�

has�its�minimum�for�dark�values)�

Using� ),,( xfvTf ,� one� may� adapt� the� contrast� threshold� weight� function� for� the�

subbands�in�DWT�domain�with�a�similar�formulation�in�(4.2):�

�for������))(,2.,(�),,(,, ΦΦ

− ∈=λλ

λ Bx xdrvTxfvTff

���������������������(4.4)�

In�order�to� include�the�effect�of�Φ,λ

T � into�the�above�formulation,� this�parameter�should�be�

multiplied� by� ),,( xfvTf � for� different� subbands� in� the� DWT� domain,� finally� giving� the�

contrast�thresholds,�as:��

�����������))(,2.,(.�))(,2.,( ,,, xdrvTTxdrvT f Φ−

ΦΦ− = λ

λλλ

���������������(4.5)�

For� the� proposed� method,� the� watermark� embedding� and� detection� processes� are�

similar� to� [5],� except� for� using� the� location� dependent� thresholds� �))(,2.,( , xdrvT Φ−

λλ

,�

� 51�

instead�of�constant�Φ,λ

T for�each�subband.�The�algorithm�can�be�summarized�as�follows�(for�

notational�simplicity,� �))(,2.,( , xdrvT Φ−

λλ

is�replaced�with�Φ,,λx

t ):�

1.� Decompose�image�into�multiple�subbands�using�9-7�biorthogonal�filters�[38].�

2.� Compute�Φ,, λx

t �for�each�subband�by��(4.5).�

3.� Embed�the�watermark�by�using:��

)6.4(�������������������������������������otherwise�������������������������������

��if�����.

,,

,,,,,,,,,,,,

*

Φ

ΦΦΦΦΦΦ

>+=

λ

λλλλλλ

x

xxxxxx

I

tIwtII �

where�Φ,, λx

I � is� the� wavelet� coefficient� at� position� x,� Φ,,*

λxI � is� the� corresponding�

watermarked�coefficient�and�Φ,, λx

w is�a�watermark�sequence.�����������

Figure�4.3�illustrates�the�difference�between�the�proposed�method�and�previous�HVS�

based�method�[10].�While�the�previous�method�inserts�the�watermark�according�to�Φ,λ

T ,�the�

proposed�method�embeds�the�watermark�according�to�Φ,,λx

t .�Obviously,�the�number�of�the�

watermarked�coefficients�in�the�proposed�method�is�lower�than�the�one�in�previous�method,�

while� the� strength�of� the�watermark�embedded�according� to�Φ,,λx

t � is�greater� than� the�one�

embedded�according�Φ,λ

T .�Since,�Φ,,λx

t �increases�with�increasing�eccentricity,�the�proposed�

method�yields�more�distortions�in�the�periphery.�However,�this�distortion�is�imperceptible�for�

a�human,�gazing�to�the�fixation�point.��On�the�other�hand,�the�detection�of�such�a�watermark�

obviously� improves� since� the� overall� watermark� energy� is� greater� than� the� previous� case.�

Moreover,�the�coefficients,�which�are�greater�than�a�threshold,�usually�belong�to�perceptually�

significant�portions.�

� 52�

Figure�4.3� Illustration�of� the�difference�between� the�previous�HVS�based�method�and� the�

proposed�method.��The�plot�shows�the�changes�in�the�magnitude�of�a�representative�number�

of�coefficients�in�subband�of ),( Φλ ,�Φ,λ

T and�Φ,,λx

t �with�respect�to�the�eccentricity,�e(v,�x).��

� 53�

4.4�Adaptation�of�the�Method�to�Video�

First�approach�for�adapting�the�method�to�the�video�is�to�watermark�the�each�frame�by�using�

the�proposed�method�for�images.��However,�if�performed,�the�resulting�video�will�have�some�

temporal�degradation,�such� that�a�human�observer�can�differentiate� the�original�video� from�

the�watermarked�one.��The�reason�for�the�visible�temporal�degradations�in�the�watermarked�

video�is�due�to�the�change�of�contrast�threshold�values�in�the�spatial�domain�because�of�the�

temporal�masking�phenomena�of�HVS� [11].� In�order� to�overcome� the�mentioned�problem,�

the�watermark�can�be�embedded�only� intra�frames�of�the�video.��The�other�frames�between�

every�intra�frame�pair�are�watermarked�by�making�linear�interpolation�in�the�spatial�domain�

between� these� two� watermarks,� which� are� embedded� into� the� intra� frames� [11].� The�

difference� between� any� two� frames� will� be� smaller� after� such� an� interpolation� and� hence,�

HVS�cannot�differentiate�the�degradations�in�the�video.��

4.5�Exper imental�Results��

In�all�the�simulations,�the�contrast�thresholds,�which�are�given�in�[20],�are�utilized.�For�all�the�

images,� a� single� foveation� point,� which� is� the� center� of� the� image,� is� assumed.� The�

watermark�signal�is�generated�from�a�zero�mean,�unit�variance,�Gaussian�distribution.�

� The�watermarking�detection�is�based�on�classical�detection�theory.�This�is�the�same�

approach� in� [7,10,11].� The� original� signal� is� subtracted� from� the� received� image,� and� the�

normalized� correlation� between� the� signal� difference� and� the� original� watermark� is�

computed.� First,� the� original� watermark� and� extracted� watermark� is� normalized� to� unit�

magnitude� and� then,� the� inner� product� between� them� is� computed� [40].� The� result� is�

compared� to� a� threshold.� If� the� result� is� greater� than� the� threshold,� then� the� watermark� is�

detected.� Otherwise,� the� watermark� is� not� detected.� The� reason� of� using� normalized�

correlation� is� its� robustness�against�attacks�such�as�changing�the�brightness�of� images�[40].�

With� such� a� method,� the� correlation� result� becomes� less� dependent� to� the� magnitudes� of�

original�and�extracted�watermark.������������

A� typical� result� for� Lena� image� is� given� in� Figure�4.4� (b).� The� region�around� the�

foveation� point� for� the� original� and� watermarked� images� are� shown� in� Figure� 4.4(c)� and�

Figure�4.4(d),� respectively,� in�order� to�present� the�perceptual�equivalence�of� these� regions.�

� 54�

The�periphery�regions�in�the�watermarked�image�are�degraded,�as�expected,�due�to�the�larger�

thresholds� in� those�regions� (see�Figure�4.4(e)�and�Figure�4.4(f)).�An� interesting�example� is�

given�in�Figure�4.5�for�the�Peppers�image.�Although�the�strength�of�the�watermark�is�higher�

in� the� periphery,� it� is� not� possible� to� sense� the� difference� between� the� original� and�

watermarked� image�even� if�a�viewer�gazes� to� the�periphery.� �For�Lena, Harbour,�Peppers,

Airfield�and�Bridge images,�the�correlation�results�against�cropping,�additive�Gaussian�noise�

with� different� variances� and� JPEG� compression� are� tabulated� in� Table� 4.1,� 4.2� and� 4.3,�

respectively.� In�order� to�determine�a� threshold� level� to�detect� the�watermark,� the�extracted�

watermark�is�correlated�with�1000�other�randomly�generated�watermarks,�in�a�similar�way�as�

in�[10].�It�should�be�noted�that�the�resulting�correlation�coefficients�are�between�0.17�and�–

0.17� for� the�Lena image.�The� results�show� that� the�watermark�can�be�detected�even� in� the�

cases�of� JPEG�compression�of� quality� 0.05,� cropping�of� 1/16�and�additive�Gaussian�noise�

resulting�PSNR�of�14�dB�between�images.�

� The�wavelet-based�watermarking�method�proposed�in�[10]�is�also�implemented�for�a�

comparison.� The� robustness� tests� of� the� wavelet-based� method� [10]� against� cropping,�

additive� Gaussian� noise� and� JPEG� compression� are� also� shown� in� Table� 4.1,� 4.2� and� 4.3,�

respectively.�The�results�indicate�a�better�performance�for�the�proposed�method�against�[10],�

in�terms�of�correlation.�For�the�proposed�method,�the�trade-off�between�the�imperceptibility�

and�the�robustness�is�managed�by�using�the�foveation�phenomena�of�HVS.��

For� the� video� simulations,� carphone.qcif� sequence� is� used.� The� robustness� of� the�

proposed�method�against� ITU�H263+�coding� standard� is� tested.�The�watermarked�video� is�

passed� through� a� H263+� coding� at� different� bit� rates.� The� results� both� for� the� proposed�

method�and�the�compared�method�[11]�are�given�in�Table�4.4.�

As�the�simulation�results�indicate,�foveation�phenomena�of�HVS�can�be�successfully�

applied� to� watermarking� for� improving� the� robustness.� Simulation� results� also� show� that�

foveation� based� watermarking� yields� an� improvement� over� previous� HVS-based�

watermarking� methods.� The� real� benefit� of� foveated� watermarking� should� be� expected� for�

video�watermarking.�

� 55�

�(a)�Original�image� �����������(b)�Watermarked�image�

�(c)��Foveated�region�in�a����������������(d)�foveated�region�in�b�

�(e)�periphery�region�in��(a)���������(f)�periphery�region�in�(b)�

�Figure�4.4:��original�image�and�watermarked�image��according�to�proposed�method.�

� 56�

(a)�

�� �

(b)� � � � (c)�

(d)� (e)�

Figure�4.5:� � (a)�original� image,� (b)�watermarked� image�according� to�proposed�method,� (c)�

watermarked� image� according� to� previous� HVS� based� method� [10],� (d)� the� amplified�

difference�image�between�(a)�and�(b),�(e)�the�amplified�difference�image�between�(d)�and�(e).��

The� brighter� points� in� the� periphery� of� (d)� compared� to� (e),� is� due� to� the� strength� of� the�

watermark,�which�is�embedded�according�to�foveation-based�thresholds.

� 57�

Cropping� Algorithm Lena� Harb� Pepp� Airf� Brid�

Proposed� 0.97� 0.89� 0.95� 0.87� 0.88�1/4�

IA-W�[11]� 0.65� 0.55� 0.54� 0.56� 0.50�

Proposed� 0.69� 0.64� 0.77� 0.54� 0.61�1/16�

IA-W�[11]� 0.33� 0.33� 0.30� 0.31� 0.25�

Table��4.1:�Correlation�Results��Against�Cropping.�(�¼�of�the�watermarked�image�is�cropped.�

In�the�detection�process,�the�rest�of�the�image�is�completed�with�the�original.)�

PSNR�(dB)�of�resulting�image�“Lena” �image� 14� 17� 20� 25� 31� 37� 40�

�Proposed�� 0.30� 0.47� 0.56� 0.78� 0.92� 0.98� 0.99�

IA-W�[11]� 0.19� 0.31� 0.40� 0.63� 0.79� 0.83� 0.96�

���������������

Table�4.2:�Correlation�Results�Against�Additive�Gaussian�Noise.�

Quality�factor���Image� Algorithm�

80� 60� 40� 20� 10� 5�

Proposed�� 0.89� 0.87� 0.85� 0.68� 0.58� 0.34�Lena�

IA-W�[5]� 0.70� 0.66� 0.62� 0.50� 0.30� 0.16�

Proposed� 0.98� 0.97� 0.93� 0.82� 0.55� 0.30�Harb�

IA-W�[5]� 0.95� 0.89� 0.79� 0.54� 0.31� 0.16�

Proposed� 0.98� 0.97� 0.95� 0.84� 0.58� 0.30�Pepp�

IA-W�[5]� 0.95� 0.90� 0.81� 0.53� 0.27� 0.14�

Proposed� 0.98� 0.96� 0.93� 0.83� 0.57� 0.22�Airf�

IA-W�[5]� 0.94� 0.88� 0.78� 0.55� 0.32� 0.19�

Proposed�� 0.98� 0.97� 0.95� 0.86� 0.66� 0.35�Brid�

IA-W�[5]� 0.96� 0.89� 0.81� 0.56� 0.30� 0.16�

Table�4.3:�Correlation�results�against�JPEG�compression�[10].�

��������������

� 58�

����������� �

Tab

le��4

.4�:�

Cor

rela

tion�

resu

lts�a

gain

st�I

TU

�H26

3+�c

odin

g�at

�dif

fere

nt�b

it�ra

tes.

�IA

-W:�H

VS�

base

d�vi

deo�

wat

erm

arki

ng�

met

hod�

�pro

pose

d�in

�[11

].��T

he�r

esul

ts��a

re�g

iven

�for

�Car

phon

e�se

quen

ce.�

� 59�

CHAPTER�5�

�TEMPORAL�WATERMARKING�OF�DIGITAL�VIDEO��

����This� chapter� presents� a� watermarking� approach� to� embed� copyright� protection� into� digital�

video.� The� approach� requires� the� original� video� to� detect� the� watermark� and� exploits�

temporal� contrast� thresholds� to� determine� the� location� where� the� watermark� should� be�

embedded� and� the� maximum� strength� of� the� watermark,� which� still� gives� imperceptible�

distortion�after�watermark�insertion.�

5.1�Introduction��

In�section�3.2,�the�problems�encountered�in�video�watermarking�are�discussed.�Briefly,�there�

are�two�main�problems�in�video�watermarking�that�makes�the�situation�different�from�image�

watermarking.� The� first� one� is� to� guarantee� the� robustness� against� attacks� such� as� frame�

dropping,� frame� averaging,� collusion� etc.� Such� attacks� have� no� counterparts� in� image�

watermarking� case.� The� second� one� is� to� provide� the� imperceptibility� of� the� watermark,�

which� is� a� relatively� more� difficult� problem� compared� to� the� image� case� due� to� the� three�

dimensional� characteristics� of� video.� The� watermarking� procedure� should� also� take� the�

variations�in�the�temporal�direction�into�account�to�provide�an�imperceptible�watermark.�Two�

solutions�in�the�literature,�which�are�based�on�HVS,�are�also�given�in�Section�3.2.�

� In�this�chapter,�we�propose�an�alternative�method�that�exploits�the�temporal�contrast�

thresholds�of�HVS.�For�a�grating�of�a�specific�spatial�frequency,�temporal�contrast�threshold�

refers�to�the�minimum�level�of�the�amplitude�of�the�sinusoidal�function�of�a�specific�temporal�

frequency,� when� the� temporal� variations� in� the� visual� target� become� visible� (see� Figure�

� 60�

2.20).�Therefore,�by� the�definition,� the�modifications,�which�are�smaller� than� the� temporal�

contrast� threshold� in� the� temporal�direction�of� the� target,�will�be� invisible.� In�other�words,�

temporal� contrast� thresholds� determine� the� maximum� level� of� the� watermark� that� will� be�

embedded�into�the�video�towards�temporal�direction.��

� In�Section�2.4,�temporal�contrast�thresholds�are�denoted�as� ),,( wvuT .�This�notation�

shows� the� temporal� contrast� threshold� where� the� visual� target� presented� to� a� subject� is� a�

grating� of� spatial� DCT� frequency� of� ),( vu � modulated� in� the� temporal� direction� with� a�

sinusoidal�function�of�temporal�frequency,� w .�These� ),,( wvuT thresholds�are�determined�in�

[33].� In� order� to� exploit� this� data,� the� video� should� also� expressed� as� a� function� spatial�

frequencies� ),( vu �and�temporal�frequency� w .�In�other�words,�the�video�should�be�converted�

from� ),,( tyx � domain,� where� ),( yx notates� the� spatial� horizontal� and� vertical� direction,�

respectively�and� t notates�temporal�direction,�to�the�transform�domain, ),,( wvu .�This�can�be�

interpreted�as�decomposing�the�video� into�spatiotemporal�frequency�components.�While�the�

first� component,� 0=w ,� (DC� component)� corresponds� to� the� average� of� the� video� in� the�

temporal� direction,� the� components� with� lower� wvalues� correspond� to� no� motion� or� little�

motion�regions�of�video�and�finally,�higher� w �values�correspond�to�the�high�motion�regions�

of�the�video.�Our�proposed�method�designs�the�video�watermark�according�to� ),,( wvuT and�

embeds�the�watermark�to�each�of� those�frequency�components.�With�such�an�approach,�the�

watermark�embedded�into�the�low�frequency�components�exist� throughout� the�videos�scene�

and�whereas�the�data�embedded�into�the�high�frequency�components�are�highly�localized�in�

time�and�change�rapidly�from�frame�to�frame.�����

Such�a�method�is�expected�to�eliminate�the�stated�problems�of�video�watermarking.�

Embedding�the�watermark�into�video�using�temporal�contrast�thresholds�solves�the�problem�

about� invisibility.� In�addition,� the�proposed�method�is�expected�to�be�robust�against�attacks�

such�as�averaging�of�regions�without�motion,�since�all�of�these�regions�correspond�to�the�first�

(DC)�frame�of�the�temporal�Fourier� transform�of�the�video�in�which�only�one�watermark�is�

inserted� into� this� (DC)� frame.� The� proposed� method� is� also� expected� to� be� robust� against�

attacks�like�frame�dropping�and�frame�averaging,�since�these�attacks�distort�mainly�the�high�

frequency� components� of� the� video� and� do� not� affect� low� frequency� components�

considerably.�Hence,�the�watermark�embedded�into�the�low�frequency�components�survives�

such�attacks.�Finally,� the�problem�about�the�collusion�attack� is�also�solved�by�the�proposed�

method,�since�none�of�the�frames�include�the�same�watermark,�when�the�watermark�insertion�

is�made�in�the� ),,( wvu �transform�domain.���

� 61�

5.2�Watermarking�Procedure��

The�overall�structure�of�the�watermarking�procedure�is�given�in�Figure�5.1.�The�first�step�is�

to�separate�the�video� into�shots.�Shot� is�defined�as�continuous�recording�of�a�single�camera�

[39].� �For�each�shot,� the� intensities�are�converted�into�contrast�values.��As�noted�in�Section�

2.4,� the� contrast� is� defined� as� the� ratio� of� the� target� intensity,� ),,( tyxIT

,� to� background�

intensity,� BI :�

B

T

I

tyxItyxC

),,(),,( = ����������������������������������������������������������(5.1)�

The�background�intensity,� BI ,�is�the�time-space�average�of�the�video�scene.� ),,( tyxC can�be�

written�as:��

)),,((

)),,((),,(),,(

tyxImean

tyxImeantyxItyxC

−= �����������������������������������������(5.2)�

After�this�point,�one�has�a�contrast�video,� ),,( tyxC ,�rather�than�intensity�video,� ),,( tyxI .�

�Figure�5.1�Overall�structure�of�the�watermarking�process.�

� 62�

),,( tyxC �should�be�transformed�to� ),,( wvu �domain�to�exploit�the�temporal�contrast�

thresholds� ),,( wvuT .�For� this�purpose,�each� frame�of� ),,( tyxC � is�divided� into�8x8�blocks�

and�DCT�of�each�block�is�calculated.�The�signal�at�this�point� is�defined�as� ),,,,( tvubybxC �

where�bx�and�by�are�the�number�of�blocks�in�horizontal�and�vertical�direction,�respectively,�u�

and�v�are�horizontal�and�vertical�spatial�frequencies,�respectively�and�t�is�time.�The�next�step�

is�to�take�the�Fourier�transform�of� ),,,,( tvubybxC �in�temporal�direction,�which�will�result�as�

),,,,( wvubybxC .��

� An� important� criterion� while� embedding� the� watermark� into� digital� content� is� to�

embed�the�watermark�into�the�perceptually�significant�part.�Since�most�of�the�common�signal�

processing�and�geometric�attacks�affect�perceptually�insignificant�parts�of�the�digital�content�

[7],�such�an�approach�makes�the�watermark�more�robust.� In�this�case,� the�digital�content� is�

video� and� perceptually� significant� parts� are� represented� by� the� coefficients� of�

),,,,( wvubybxC � which� are� greater� than� the� temporal� contrast� thresholds,� ),,( wvuT .� The�

smaller�coefficients�are�not�significant,�since�HVS�will�not�sense�them�and�probably,� these�

parts� will� be� eliminated� with� any� lossy� compression� such�as� ITU� H263+� or�MPEG� video�

coding�standards.�

� One� other� important� point� during� the� watermark� insertion� is� to� take� the� trade-off�

between� robustness� and� imperceptibility� into� account� [7].� An� increase� in� the� robustness�

performance� might� yield� a� decrease� in� imperceptibility.� In� order� not� to� affect� the�

imperceptibility� of� the� watermark,� its� strength� should� not� exceed� the� temporal� contrast�

thresholds.��

� By�using�these�two�facts,�the�watermark�insertion�is�described�by�the�relation,�as�

)3.5(otherwise��������������������������������������������),,,,(

),,(�),,,,(�if��),,().,,,,(),,,,(),,,,(*

≥+

=wvubybxC

wvuTwvubybxCwvuTwvubybxWwvubybxCwvubybxC �

where� ),,,,(* wvubybxC �denotes�the�watermarked�coefficients�and� ),,,,( wvubybxW � is�the�

watermark� sequence.� As� it� can�be� observed� from� (5.3),� the�watermark� is� inserted� into� the�

magnitude� of� the� transform� coefficients� only.� The� lower� row� of� (5.3)� satisfies� the� first�

criterion�that�is�to�embed�the�watermark�into�significant�part�of�the�video.�The�upper�row�of��

(5.3)� satisfies� the� requirement� that� the� strength� of� the� watermark� should� not� exceed� the�

temporal� contrast� thresholds� in� order� to� be� invisible,� provided� that� the� absolute� value� of�

� 63�

),,,,( wvubybxW �is�not�greater�than�1.�However,�if�the�watermark�is�chosen�uniformly�from�

a� restricted� interval,� e.g.,� [-1,1],� then� the� watermark� will� be� so� vulnerable� to� multiple-

document�(collusion)�attacks.��

The� multiple-document� attack� can� be� described� as� using� multiple� watermarked�

copies� '

1D ,� '

2D … '

tD � of� document� D � to� produce� an� unwatermarked� document *D � [7].� In�

order�to�eliminate�this�problem,�the�watermark�is�generated�from�a�zero�mean,�unit�variance�

Gaussian� distribution� and� temporal� contrast� thresholds� are� divided� into� the� mean� of� the�

maximum�values�of� the�1000�watermark� that� are� in� size�176�x� 144� (the� frame�size�of� the�

QCIF� sequences).� � � In� this�manner,� the�watermark�signal�will�be�mostly� lower� than�1�and�

added�signal�to�the�video�will�not�exceed�the�contrast�thresholds�so�much.��

5.3�Watermark�Detection���

The� overall� structure� of� the� watermark� detection� process� is� illustrated� in� Figure� 5.2.� The�

detection� is� based� on� the� calculation� of� normalized� correlation� between� the� original�

watermark� and� extracted� watermark� from� the� video� which� has� passed� from� some� signal�

processing�operations,�such�as�additive�Gaussian�noise,�video�coding,�frame�dropping,�frame�

averaging�etc.�The�normalized�correlation�compared� to�a� threshold.� If� it� is�greater� than� the�

threshold,� then� the� watermark� is� assumed� to� be�detected.�Otherwise,� the� watermark� is� not�

detected.��

One� of� the� important� points� while� determining� the� threshold� level� is� the� false�

positive�and� false�negative�probability.�A� false�positive�occurs�when�a�watermark�detector�

indicates� the� presence� of� a� watermark� in� an� unwatermarked� video.� A� false� positive�

probability�is�the�likelihood�of�such�an�occurrence�[40].�On�the�other�hand,�a�false�negative�

occurs�when�a�watermark�fails�to�detect�a�watermark�that�is�present�[40].�When�the�threshold�

level� is� increased,� the� false� negative� probability� decreases,� whereas� the� false� positive�

probability� increases.� Therefore,� the� threshold� level� is� determined� by� taking� the� trade-off�

between�the�false�positive�and�false�negative�probabilities�into�account.�In�order�to�determine�

such�threshold�level,� the�correlation�is�calculated�for�both�watermarked�and�unwatermarked�

cases.�The�process� is� repeated�significant� times�and� the�minimum�of� the�correlation�results�

for� the� watermarked� video� case� and� the� maximum� of� the� correlation� results� for� the�

unwatermarked�video�case�are�determined.�These�two�levels�should�be�separated�from�each�

other�as�far�as�possible,�in�order�to�survive�from�false�positive�and�false�negatives.�

� 64�

In� the� watermark� detection� process,� the� watermarked� video,� which� has� passed�

through�some�signal�processing�operations,�is�separated�into�shots�and�video�signal�for�a�shot�

is�notated�as� ),,(* tyxI .�The�aim�is�to�extract�the�watermark�from� ),,(* tyxI .��

(a)

(b)�

�Figure�5.2�Overall�structure�of�the�watermark�detection�process.��

������������

The� first� step� in� the� detection� process� is� to� convert� ),,(* tyxI � into� the� transform�

domain�where�the�watermark�is�inserted.�Figure�5.2�(a)�illustrates�this�case.�The�transformed�

signal� is� notated� as� ),,,,(* wvubybxC .� ),,,,( wvubybxC � is� subtracted� from�

),,,,(* wvubybxC and� the� difference� is� divided� by� ),,( wvuT .� In� mathematical� terms,� the�

operation�is�as�follows.��

),,,,(),,,,(*),,,,( wvubybxCwvubybxCwvubybxD −= �(5.4)�

),,(

),,,,(),,,,(*

wvuT

wvubybxDwvubybxW = �

),,,,(* wvubybxW is� the� extracted� watermark� and� normalized� correlation� between�

),,,,(* wvubybxW �and� ),,,,( wvubybxW should�be� found.�The�correlation� is� first� found� for�

each�discrete�frequency� wand�then,�the�mean�in� wdirection�is�taken:��

� 65�

1

),,,,(

),,,,(1

wwwvubybxW

wvubybxWv

== �

���

1

),,,,(

),,,,(

*

*

2

wwwvubybxW

wvubybxWv

=

= ���������������������������������������(5.5)�

)��and��ofproduct�inner��(i.e����.)(21211

vvvvwp = �

))(( wpmeanncorrelatio = �

Finally,�the�mean�is�compared�to�a�threshold�for�detection.��

��

5.4�Simulation�Results�

The� sequences�utilized� in� the� simulations�are� Coastguard and Carphone � sequences.�Each�

frame� is�of�174x144.�Only� first�60� frames�of� the�sequences�are�used�and� the�watermark� is�

embedded�only�Y�component.��

An� original� frame� from� each� video� is� illustrated� in� Figures� 5.3� (a)� and� 5.4� (a).��

Watermarked�frames�corresponding�to�each�one�are�illustrated�in�Figures�5.3�(b)�and�5.4�(b).�

The�PSNR�values�between�the�original�and�watermarked�frames�are�determined�as�39.9�and�

39.6,� respectively.� The� original� frame� and� watermarked� frame� are� visually� not�

distinguishable� as� illustrated.� � However,� the� visual� equivalence� of� the� watermarked� and�

original�frame�does�not�require�the�visual�equivalence�of�the�watermarked�video�and�original�

video.� As� noted,� the� differences� between� the� watermarked� video� and� original� video� can�

become� visible� due� to� the� temporal� characteristics� of� the� video.� Because� of� this� case,� the�

watermarked� video� and� original� video� are� presented� to� a� number� of� subjects� and� checked�

whether�they�sense�the�difference�between�two�videos.�According�to�these�informal�tests,�the�

videos�are�assumed�as�visually�equal.��

� According� to� motion� content� of� the� video,� the� number� of� the� coefficients� to� be�

watermarked� differs.� The� number� of� the� watermarked� coefficients� in� each� frame� of� the�

temporal� discrete� Fourier� transform� of� the� two� videos� is� illustrated� in� Figure� 5.5.� The�

numbers�of�watermarked�coefficients�are�decreasing�while�the�frequency�is�increasing.�There�

are�two�reasons�of�this�fact.�First,�the�temporal�contrast�thresholds�levels�are�increasing�while�

� 66�

the� temporal� frequency� is� increasing.� Second,� the� magnitudes� of� the� temporal� discrete�

Fourier� transform� of� the� videos� are� decreasing.� Due� to� these� reasons,� the� numbers� of� the�

coefficients� that� are� greater� than� the� temporal� contrast� thresholds�are� decreasing�while� the�

frequency�is�increasing.�In�Figure�5.6,�the�magnitude�of�the�difference�between�the�temporal�

discrete�Fourier�transform�magnitudes�of�the�original�and�watermarked�video�sequences�are�

illustrated� for� 4� different� discrete� frequency,� i.e.,� magnitude� of�

( ),,,,(),,,,(* wvubybxCwvubybxC − )� is� illustrated� for� 4� different� w .� As� w increases,� the�

watermarked� coefficients� are� the� ones� that� correspond� to� the� high� motion� regions� of� the�

video.� � For� the� 0=w ,� (DC� case),� most� of� the� low� spatial� frequency� elements� of� the� 8x8�

blocks�are�watermarked.�

(a)� � � ����������(b)��

Figure� 5.3� Frame� from� coast� video.� (a)� original� frame,� (b)� watermarked� frame.� PSNR�

between�the�watermarked�and�original�frame�is�39.9.�

(a)� � � ����������(b)�Figure� 5.4� Frame� from� carphone� video.� (a)� original� frame,� (b)� watermarked� frame.� Psnr�

between�the�watermarked�and�original�frame�is�39.6.�

� 67�

0 5 10 15 20 25 300

1000

2000

3000

4000

5000

6000

discrete�temporal�frequency

no.�

of�w

ater

mar

ked�

coef

ficie

nts

(a)�

0 5 10 15 20 25 300

500

1000

1500

2000

2500

3000

3500

4000

discrete�temporal�frequency

no.�

of�w

ater

mar

ked�

coef

ficie

nts

(b) �

Figure� 5.5� The� number� of� watermarked� coefficients� vs.� discrete� temporal� frequency� for�

coast qcif�sequence�(a),�and�carphone�qcif�sequence�(b).�Total�number�of�watermarked�qcif�

sequence�is�70270�for�the�coast sequence�and�45563�for�the�carphone sequence.���

� 68�

��������������� �

(a)����������������������������������������������������������(b)��

��������������� �

(c)� (d)��

Figure� 5.6� Illustration� of� where� the� watermark� are� embedded� in� the� temporal� frequency�

domain.�(a)� 0=w ,�(b)� 8=w ,�(c)� 20=w ,�(d)� 26=w �where� wshows�the�discrete�temporal�

frequency.� The� number� of� watermarked� coefficients� is� decreasing� while� w increases.� The�

plots�are�given�for�Carphone�sequence.�( w is�the�discrete�temporal�frequency�corresponding�

to�continuous�frequency�of� swN

w�.�

1−for�the�case�of�N�point�discrete�Fourier�transform�and�

sampling�frequency�of� sw .��As�noted,�N�is�60�and� sw �is�30�Hz.)�

� 69�

� During�simulations�for�the�robustness�results,�the�same�signal�processing�operation�is�

applied� both� original� and� watermarked� video.� Watermark� embedding� and� watermark�

detection� process� are� repeated� 30� times� with� different� watermarks� in� each� case.� The�

minimum�value�of�these�correlation�results,�when�watermark�is�present�in�the�video,�and�the�

maximum�value�of� the�correlation� results,�when�watermark� is� not�present� in� the� video�are�

determined.� The� larger� distance� between� that� minimum� and� maximum� values� shows� the�

robustness�of�the�system.��

5.4.1�Robustness�to�Additive�Gaussian�Noise��

In� order� to� model� video� coding� techniques� that� are� based� on� temporal� sensitivity� of� HVS�

(e.g.� such� as� 3-D� transform� coding),� the� watermarked� video� is� corrupted� with� additive�

Gaussian�noise�that�is�added�to�the�video�in�the�temporal�frequency�domain�after�multiplying�

it�with�the�temporal�contrast�thresholds:�

),,().,,,,(),,,,(),,,,( * wvuTwvubybxNwvubybxCwvubybxNW += �����(5.6)�

��),,,,( wvubybxNW is� the� noise� added� watermarked� coefficient,� ),,,,( wvubybxN � is� the�

additive� Gaussian� noise� with� zero� mean� and� variance� of� 0.1.� In� Table� 5.1,� the� correlation�

results�for�each�video�sequence�with�and�without�watermark�are�given.�The�maximum,�mean�

and�minimum�correlation�results�are�computed�overall�30�runs.�It�is�important�to�note�that�the����

minimum�correlation�values�with�watermark�are�much�larger�than�the�maximum�correlation�

values�without�watermark.�The�mean�of� the� inner�product� results� (see�Eqn.� (5.5))�after�30�

runs� is� drawn� as� a� function� of� discrete� temporal� frequency� in� figure� 5.7� for� each� video�

sequence�with�and�without�watermark.�It�is�clearly�seen�that�at�each�temporal�frequency,�the�

difference�between�the�correlations�for�watermarked�and�unwatermarked�case�is�quite�high.�

Table�5.1�Correlation�results�for�Coast�and�Carphone�sequences�after�Gaussian�noise.��

�With�watermark� Without�watermark�

Video�PSNR�

(dB)� Max� Mean� Min� Max� Mean� Min�

Coast 26.9� 0.9721� 0.9670� 0.9648� 0.0256� 0.0049� -0.0164�

Carphone 27.8� 0.9822� 0.9810� 0.9776� 0.0065� -0.0025� -0.0132�

� 70�

0 5 10 15 20 25 30-0.2

0

0.2

0.4

0.6

0.8

1

discrete�temporal�frequency

corr

elat

ion

Figure�5.7�Mean�of� the� inner�product� results� (see�Eqn.� (5.5))�as�a� function�of� the�discrete�

temporal� frequency� after� additive� Gaussian� noise.� The� graph� is� drawn� for� Coast� qcif�

sequence.� ‘x’s� show� � � the�correlation� results� for� the�watermarked�video�and� ‘o’s�show� the�

correlation�results�for�the�original�video.��

5.4.2�Robustness�to�ITU�H263�+�Coding��

One�of�the�most�probable�signal�processing�operations�for�video�is�a�lossy�coding�stage�that�

is�applied� for� the�purpose�of�storage�and�transmission�of�digital�video�at� low�bit� rates.�The�

robustness�of�the�watermarking�method�against�ITU�H263�+�coding�is�tested�for�different�bit�

rates.�In�the�testing�process,�one�of�each�five�consecutive�frames�is�set�as�an�intra�frame�and�

the�test�is�repeated�for�different�quantization�levels.�The�bit�rate�is�decreased�until�240�kbps�

by�increasing�the�quantization�level.�The�watermark�is�survived�until�the�bit�rate�of�230-240�

kbps.�The�correlation� results�are� illustrated� in�Table�5.2� for� the�Coastguard�and�Carphone�

sequences.���In�Figure�5.8,�the�inner�product�results�for�each�different�temporal�frequency�are�

illustrated.�While�the�inner�product�(see�(5.5))�for�the�DC�term�( 0=w )�is�very�high,�the�ones�

for� the�AC�terms�are�quite� low.�The�coding�is�distorting�mostly�AC�terms.�After�230�kbps,�

more�compression�makes�the�watermark�undetectable.��

� 71�

Table�5.2�Correlation�results�for�Coast�and�Carphone�sequences�after�ITU�H263�+�Coding.�

�With�watermark� Without�watermark�

�Video�Bit�rate�

(kbps)�

PSNR�

(dB)� Max� Mean� Min� Max� Mean� Min�

Coast 230� 29.4� 0.1796� 0.1416� 0.1291� 0.0282� -0.0084� -0.0241�

Carphone 246� 34.5� 0.2238� 0.2093� 0.1929� 0.0094� 0.0016� -0.0098�

0 5 10 15 20 25 30-0.2

0

0.2

0.4

0.6

0.8

1

discrete�temporal�frequency

corr

elat

ion

Figure�5.8�Mean�of� the� inner�product� results� (see�Eqn.� (5.5))�as�a� function�of� the�discrete�

temporal�frequency�after�ITU�H263+�coding�at�a�bit�rate�of�230�kbps.�The�graph�is�drawn�for�

Coast�qcif�sequence.� ‘x’s�show��� the�correlation�results�for�the�watermarked�video�and�‘o’s�

show�the�correlation�results�for�the�original�video.�

5.4.3�Robustness�to�Frame�Dropping�and�Frame�Averaging��

Some� distortions,� which� are� based� on� temporal� characteristics� of� the� digital� video,� are�

temporal� cropping,� frame� dropping� and� frame� interpolation.� An� attacker� can� conserve� the�

visual� quality� of� the� digital� video� by� dropping� some� frames� from� the� video� and/or� by�

replacing�them�by�making�frame� interpolation.�The�robustness�of� the�watermarking�method�

� 72�

to�such�attacks� is� tested� in� this�part.�For� the� frame�dropping�case,�one�of�each�consecutive�

two� frames� is�dropped,�which� is� the�one�of� the�worst�case�while�probably�maintaining� the�

visual� quality.� For� the� frame� interpolation� case,� one� of� each� consecutive� two� frames� is�

dropped� and� replaced� by� the� average� of� two� neighboring� frames.� Each� of� these� attacks�

mainly� distorts� the� high� frequency� components� of� the� video.� Therefore,� only� the� low�

frequency� components� (first� 15� components)� are� taken� into� account� while� computing� the�

correlation� in� the� detection� part.� � The� correlation� results� for� frame� dropping� and� frame�

averaging� are� illustrated� in� Table� 5.3� and� 5.4,� respectively.� The� inner� product� results� vs.�

discrete�temporal�frequency�are�illustrated�in�Figure�5.9�and�5.10,�respectively,�for�the�case�

of� frame�dropping�and� frame�averaging.�Especially,� in�Figure�5.10,� the� fact� that� the� frame�

averaging�distorts�mainly�high�frequency�components�is�obvious,�since�inner�product�results�

are�decreasing�steadily�while�the�frequency�increases.���

Table�5.3�Correlation�results� for�Coast�and�Carphone�qcif�sequences�after�frame�dropping.�

One�frame�from�each�two�consecutive�frame�is�dropped.���

With�watermark� Without�watermark�Video�

Max� Mean� Min� Max� Mean� Min�

Coast 0.3983� 0.3925� 0.3863� 0.0079� 0.0011� -0.0072�

Carphone 0.2410� 0.2287� 0.2105� 0.0010� 0.0014� -0.0084�

Table�5.4�Correlation�results�for�Coast�and�Carphone�qcif�sequences�after�frame�averaging.�

The� odd� index� frames� are� dropped� and� replaced� by� the� average� of� the� two� neighboring�

frames.���

With�watermark� Without�watermark�Video�

Max� Mean� Min� Max� Mean� Min�

Coast 0.4295� 0.4215� 0.4127� 0.0065� -0.0016� -0.0123�

Carphone 0.2737� 0.2624� 0.2537� 0.0114� 0.0004� -0.0068�

� 73�

0 2 4 6 8 10 12 14 16-0.2

0

0.2

0.4

0.6

0.8

1

discrete�temporal�frequency

corr

elat

ion

�Figure�5.9�Mean�of� the� inner�product� results� (see�Eqn.� (5.5))�as�a� function�of� the�discrete�

temporal� frequency� after� frame� dropping.� One� frame� from� each� two� consecutive� frame� is�

dropped.�The�graph�is�drawn�for�Coast�qcif�sequence.�‘x’s�show���the�correlation�results�for�

the�watermarked�video�and�‘o’s�show�the�correlation�results�for�the�original�video.��

� �

� 74�

0 2 4 6 8 10 12 14 16-0.2

0

0.2

0.4

0.6

0.8

1co

rrel

atio

n

discrete�temporal�frequency�

Figure�5.10�Mean�of�the� inner�product�results�(see�Eqn.�(5.5))�as�a�function�of�the�discrete�

temporal� frequency�after� frame�averaging.�The�odd� index� frames�are�dropped�and�replaced�

by�the�average�of�the�two�neighboring�frames.��The�graph�is�drawn�for�Coast�qcif�sequence.�

‘x’s�show�� � the�correlation�results� for� the�watermarked�video�and� ‘o’s�show�the�correlation�

results�for�the�original�video.��

� 75�

CHAPTER�6��

��

SUMMARY�AND�DISCUSSIONS��

��Two� new� watermarking� methods,� which� consider� HVS� in� their� formulation,� have� been�

proposed�in�this�thesis:��

The�first�method,�which�is�based�on�the�foveation�phenomenon�of�HVS,�embeds�the�

watermark�into�the�periphery�with�the�assumption�that�the�human�eye�gazes�to�the�center�of�

the� image.�With�such�an�assumption,� the�visual�difference�between�original� image�and� the�

watermarked� image� cannot� be� sensed� by� HVS.� The� robustness� results� show� that� the�

watermarking� scheme� can� survive� the� attacks,� such� as� additive� Gaussian� noise,� JPEG�

compression� and� cropping.� In� addition,� it� shows� better� performance� with� respect� to� the�

previous�HVS�based�watermarking�methods.�Actually,� the� robustness� results�are�expected,�

since�more�watermarking�energy� is�embedded�into�the�periphery�regions�of�the� image.�Due�

to� this� reason,� the� overall� watermark� energy,� embedded� into� the� image� increases� and�

obviously,� detecting� the� watermark� having� more� energy� becomes� easier.� However,� the�

subjective�quality�still�does�not�change.����

One�of�the�important�points�to�note�is�assuming�the�center�point�of�the�image�as�the�

gazing�point�of�HVS.�Actually,� the�assumption� is�only�made� for�simulation�purposes.� It� is�

possible�to�extend�the�scheme�for�multiple�foveation�points,�which�is�the�more�usual�case�in�

daily�life.�While�a�person�gazes�to�point�while�watching�the�TV,�another�people�may�gazes�to�

another�point.�In�fact,�the�problem�is�not�to�extend�the�method�for�multiple�foveation�points,�

but� to� determine� the� foveation� points.� For� such� a� purpose,� it� is� possible� to� integrate� the�

watermarking�scheme�with�an�image�understanding�system.��For�example,�the�human�face�is�

likely�to�be�the�foveation�point�when�it�is�recognized�once.�It�is�also�more�likely�that�the�high�

motion�regions�attract�more�attention�than�the�slow�motion�regions�in�the�video.�Hence,�such�

an� image�understanding�system� that�determines� the�human� faces�or�high�motion� regions� in�

video,�can�be�used�before�the�watermarking�scheme.��

� 76�

The�second�proposed�method,�which�is�tailored�for�video�watermarking,�is�based�on�

the�temporal�sensitivity�of�HVS.�The�method�embeds�the�watermark�in�the�temporal�Fourier�

domain� by� exploiting� the� temporal� contrast� thresholds,� which� are� obtained� by� subjective�

psychovisual�experiments.� �The� robustness� results�show� that� the�watermarking�scheme�can�

survive�the�typical�video�attacks,�such�as�additive�Gaussian�noise,�ITU�263�+�coding,�frame�

dropping�and�frame�averaging.�One�interesting�point�in�the�results�is�better�robustness�of�the�

DC�term�of�video�compared�to�AC�terms,�especially�in�the�test�for�ITU�263+�coding.�While�

the�correlation�for�the�AC�terms�of�the�video�can�not�be�detected�after�the�bitrate�of�230-240�

kbps,� the� correlation� result� for� the� DC� term� can� stand� up� to� even� 50-60� kbps.� One� may�

conclude�that�the�ITU�263+�coding�distorts�mostly�the�AC�components�of�video.�

� While�testing�the�algorithms,�the�computational�complexity�of�the�algorithms�is�not�

taken� into� account,� since� the� main� application� is� assumed� as� the� copyright� protection.� As�

noted,� the� computational� cost� and� memory� requirements� are� not� a� priority� in� copyright�

protection.�The�owner�of�the�content�may�want�to�prove�his/her�ownership,�whether�it�takes�

days�to�complete�the�watermark�detection�process.�In�contrast�to�the�case,�if�the�same�idea�is�

used� in� a� broadcast� monitoring� application,� the� algorithm� should� surely� take� those�

requirements� into� account.� Although� formal� tests� are� not� performed,� the� complexity� of�

algorithms�is�not�demanding.��

� The� proposed� method� embeds� the� watermark� only� into� Y� component� of� video.�

However,�it� is�possible�to�extend�the�scheme�by�watermarking�chromatic�components.�Such�

an�approach�will�improve�the�robustness,�while�not�loosing�from�imperceptibility�due�to�low�

sensitivity�of�chromatic�components.��

� One�other�possible�extension�of�the�method�can�be�realized�with�the�use�of�temporal�

masking� phenomenon� of� HVS.� In� such� a� scheme,� the� temporal� contrast� threshold� for� a�

specific� temporal� frequency� increases� due� to� the� masking� of� a� temporal� variation� at� a�

different� temporal� frequency.�This� phenomenon�of�HVS�can�be� interpreted�as� the� contrast�

masking� in� temporal� direction.� One� may� expect� the� robustness� of� such� a� watermarking�

scheme� will� be� better� compared� to� the� proposed� method� due� to� the� increase� in� contrast�

thresholds.��

� 77�

REFERENCES��

[1]�����Ingemar�J.Cox,�Matt�L.�Miller�and�Jeffrey�A.�Bloom,�“Watermarking�Applications�and���

their�properties” ,�Int.�Conf.�On�Information�Technology’2000,�Las�Vegas,�2000.�

[2]� � � Gerhard� C.� Langelaar,� Iwan� Setyawan,� and� Reginald� L.� Lagendijk,� “Watermarking�

Digital�Image�and�Video�Data” ,�IEEE�Signal�Processing�Magazine,�September�2000.�������

[3]� � Maurice� Mass,� Ton� Kalker,� Jean-Paul� M.G� Linnartz,� Joop� Talstra,� Geert� F.� G.�

Depovere,� and� Jaap� Haitsma,� “ � Digital� Watermarking� for� DVD� Video� Copy�

Protection” ,��IEEE�Signal�Processing�Magazine,�September�2000.��

[4]�����Fabien�A.P.�Petitcolas,�“ �Watermarking�Schemes�Evaulation” ,�IEEE�Signal�Processing�

Magazine,�September�2000.��

[5]� � � Technical� Report,� submitted� to� � The� Scientific� and� Technical� Research� Council� of�

Turkey�(Tübitak)�under�project�EEEAG�101E007,�April�2002.���

[6]� � � �Jean�François�Delaigle,� “ �Protection�of� Intellectual�Property�of� Images�by�perceptual�

Watermarking” ,�Ph.D�Thesis�submitted�for�the�degree�of�Doctor�of�Applied�Sciences,�

Universite�Catholique�de�Louvain,�Belgique.����

[7]� � � Ingemar� J.� Cox,� Joe� Kilian,� Tom� Leighton,� and� Talal� Shamoon,� “Secure� Spread�

Spectrum� Watermarking� for� Multimedia” ,� IEEE� Trans.� on� Image� Processing,� 6,� 12,�

1673-1687,�(1997).��

[8]� � � Mitchell� D.� Swanson,� Mei� Kobayashi,� and� Ahmed� H.� Tewfik,� “Multimedia� Data-�

Embedding�and�Watermarking�Technologies” ,�Proceedings�of�the�IEEE.,�Vol.�86,�No.�

6,�June�1998.�

[9]�����Mitchell�D.�Swanson,�Bin�Zhu,�and�Ahmed�H.�Tewfik,�“Transparent�Robust�Image�

Watermarking” ,�1996�SPIE�Conf.�on�Visual�Communications�and�Image�Proc.�

[10]���Christine�I.�Podilchuk�and�Wenjun�Zeng,�“ �Image-Adaptive�Watermarking�Using�

Visual�Models” ,�IEEE�Journal�of�Selected�Areas�in�Communications,�Vol.16,�No.4,�

May�1998.��

� 78�

[11]���Raymond�B.�Wolfgang,�Christine�I.�Podilchuk�and�Edward�J.�Delp,�“Perceptual�

Watermarks�for�Image�and�Video” ,�Proceedings�of�the�IEEE,�Vol.�87,�No.�7,�July�

1998.��

[12]���Sergio�D.�Servetto,�Christine�I.�Podilchuk,�Kannan�Ramchandran,�“Capacity�Issues�in�

Digital�Image�Watermarking” ,�In�the�Proceedings�of�the�IEEE�International�

Conference�on�Image�Processing�(ICIP),�Chicago,�IL,�October�1998.�

[13]���Ingemar�J.�Cox�and�Matt�L.�Miller,�“A�review�of�watermarking�and�the�importance�of�

perceptual�modeling” ,�Proc.�of�Electronic�Imaging’97,�February�1997.�

[14]���Stefan�Winkler,�Pierre�Vandergheynst,�“ �Computing�Isotropic�Local�Contrast�from�

Oriented�Pyramid�Decompositons” ,��in�Proc.�ICIP,�Vol.�4,�PP.�420-424,�Kyoto,�Japan,�

1999.�

[15]��Eli�Peli,�“Contrast�in�Complex�Images” ,�Journal�of�optical�Society�of�America,�

A/Vol.7,�No.�10,�October�1990.���

[16]��Jae�S.�Lim, Two-dimensional Signal and Image Processing,�PP.�429-430.�Prentice�Hall,�

1990.��

[17]���T.�N.�Cornsweet,�Visual Perception, PP.�152-154.�New�York:�Academic,�1970.��

[18]� �A.B.�Watson,� “DCT�quantization�matrices�visually�optimized� for� individual� images” ,�

Proc.�of�SPIE�on�Human�Vision,�Visual�Processing,�and�Digital�Display�IV,�1993.�

[19]���G.�E.�Legge�and�J.�M.�Foley,�"Contrast Masking in Human Vision",�J.�Opt.�Soc.�Am.,�

70(12),�PP.�1458-1471,�1980.�

[20]� � A.� Watson,� G.� Yang,� J.� Solomon,� J.� Villasenor,� “Visibility� of� wavelet� quantization��

noise” ,�IEEE�Transactions�on�Image�Processing,�vol.�6,�no.8,�pp.�1164-1175,�August,�

1997�

[21]��Bernd�Girod,�“The�information�theoretical�significance�of�spatial�and�temporal�masking�

in� video� signals” ,� SPIE� Vol.� 1077� Human� Vision,� Visual� Processing� and� Digital�

Display,�1989.��

[22]���Bernd�Girod,�lecture�notes,�www.stanford.edu/class/ee392c/lectures/chapter05.pdf.��

[23]���Bin�Zhu,�Ahmed�H.�Tewfik,�Ömer�N.Gerek,�“Low�Bit�Rate�Near-Transparent�Image�

Coding,” �in�Proc.�of�the�SPIE�Int.�Conf.�on�Wavelet�Apps.�for�Dual�Use,�Vol.�2491,�

(Orlando,�FL),�PP.�173-184,�1995.�

[24]���Web�Vison�Home�Page,�http://webvision.med.utah.edu/anatomy.html�

[25]���Kimball’s�Biology�pages,�http:�//�users.rcn.com/�jkimball.ma.ultranet/�BiologyPages/�

V/�Vision.html.�

� 79�

[26]� � Zhou� Wang� and� Alan� Conrad� Bovik,� “Embedded� Foveation� Image� Coding” ,� IEEE�

Transactions�on�Image�Processing,�Vol.�10,�No.�10,�October�2001.���

[27]� �Wilson�S.�Geisler�and�Jeffrey�S.�Perry,� “A� real-time� foveated�multiresolution�system�

for�low-bandwidht�video�communication” ,��SPIE�Proceedings,�Vol.�3299,�1998.��

[28]���Amir�Said�and�William�A.�Pearlman,�“A�New�Fast�and�Efficient�Image�Codec�Based�

on�Set�Partitioning�in�Hierarchical�Trees” ,�IEEE�Transactions�on�Circuits�and�Systems�

for�Video�Technology,�Vol.�6,�No.3,�June�1996.����

[29]� � �Watson,�A.B.�(1986).�Temporal sensitivity.� In�Boff,�K.,�Kaufmann,�L.�&�Thomas,�J.�

(eds.),� Handbook� of� Human� Perception� and� Performance,� 1,� Chapter� 6.� New� York:�

John�Wiley�and�Sons.�

[30]� � Kelly,� D.� H.� ,� “Effects� of� sharp� edges� in� a� flickering� field” ,� Journal� of� the� Optical�

Society�of�America,�1959,�49,�730-732.��

[31]� � Roufs,� J.A.J.� � “Dynamic� properties� of� vision-I.� Experimental� relationships� between�

flicker�and�flash�thresholds” ,�Vision�Research,�1972,�12,�261-278.�

[32]���Robson,�J.G.�“Spatial�and�temporal�contrast�sensitivity�functions�of�the�visual�system”,�

Journal�of�theOptical�Society�of�America,�1966,�56,�1141-1142.��

[33]���Andrew�B.�Watson,�James�Hu,�John�F�McGowan�III,�“DVQ:�A�Digital�Video�Quality�

Metric�based�on�Human�Vison” ,�Journal�of�Electronic�Imaging,�10(1),�20-29.�

[34]� � Deepa� Kundur� and� Dimitrios� Hatzinakos,� “A� Robust� Digital� Image� Watermarking�

Method� using� Wavelet-Based� Fusion” ,� IEEE� Signal� Processing� Society� 1997�

International�Conference�on�Image�Processing�(ICIP'97).��

[35]� � Martin� Kutter� and� Stefan� Winkler,� “A� vison-based� Masking� � Model� for� Spread-

Spectrum�Image�Watermarking” ,�IEEE�Transactions�on�Image�Processing,�Vol.11,�No.�

1,�January�2002.��

[36]���Mitchell�D.�Swanson,�Bin�Zhu�and�Ahmed�H.�Tewfik,�“Multiresolution�Scene-Based�

Video� Watermarking� using� perceptual� Models” ,� IEEE� Journal� on� Selected� Areas� in�

Communications,�Vol.�16,�No.�4,�May�1998.������

[37]���M.�Ramkumar,�A.�N.�Akansu�and�A.�A.�Alatan,�“On�the�choice�of�transforms�for�data�

hiding�in�compressed�video,” �Proc.�IEEE�ICASSP�'99,�Phoenix,�pp.�3049-3052,�1999.�

[38]� � M.� Antonini,� M.� Barlaud,� P.� Mathieu� and� I.� Daubechies.� “ Image� coding� using� the��

wavelet�transform” ,�IEEE�Trans.�on�Image�Processing,�Vol.�1,�PP.�205-220,�Feb�1992.�

[39]� � A.� Aydın� Alatan,� Ali� N.� Akansu� and� Wayne� Wolf,� “Multi-Modal� Dialog� Scene�

Detection� Using� Hidden� Markov� Models� for� Content-Based� Multimedia� Indexing” ,����

Multimedia�Tools�and�Applications,�14,�137-151,2001.���

� 80�

[40]� � Ingemar� J.� Cox,� Matthew� L.� Miller,� Jeffrey� A.� Bloom,� Digital Watermarking,�

Academic�Press,�2002.� �