Project Report

CHAPTER 1

INTRODUCTION

1. Introduction to Digital Image Processing:

Vision allows humans to perceive and understand the world surrounding us.

Computer vision aims to duplicate the effect of human vision by electronically perceiving

and understanding an image.

Giving computers the ability to see is not an easy task - we live in a three dimensional (3D)

world, and when computers try to analyze objects in 3D space, available visual sensors (e.g.,

TV cameras) usually give two dimensional (2D) images, and this projection to a lower

number of dimensions incurs an enormous loss of information.

In order to simplify the task of computer vision understanding, two levels are usually

distinguished; low-level image processing and high level image understanding.

Usually very little knowledge about the content of images

High level processing is based on knowledge, goals, and plans of how to achieve those goals.

Artificial intelligence (AI) methods are used in many cases. High-level computer vision tries

to imitate human recognition and the ability to make decisions according to the information

contained in the image.

This course deals almost exclusively with low-level image processing, high level in which is

a continuation of this course.

Age processing is discussed in the course Image Analysis and Understanding, which is a

continuation of this course.

1.1 History:

Many of the techniques of digital image processing, or digital picture processing as it was often

called, were developed in the 1960s at the Jet Propulsion Laboratory, MIT, Bell Labs, University of

Maryland, and few other places, with application to satellite imagery, wire photo standards

conversion, medical imaging, videophone, character recognition, and photo enhancement. But the

cost of processing was fairly high with the computing equipment of that era. In the 1970s, digital

image processing proliferated, when cheaper computers Creating a film or electronic image of any

picture or paper form. It is accomplished by scanning or photographing an object and turning it into

a matrix of dots (bitmap), the meaning of which is unknown to the computer, only to the human

viewer. Scanned images of text may be encoded into computer data (ASCII or EBCDIC) with page

recognition software (OCR).

http://www.answers.com/topic/seventy

http://www.answers.com/topic/optical-character-recognition

http://www.answers.com/topic/videophone

http://www.answers.com/topic/medical-physics

http://www.answers.com/topic/satellite-imagery

http://www.answers.com/topic/university-of-maryland-1

http://www.answers.com/topic/university-of-maryland-1

http://www.answers.com/topic/bell-labs

http://www.answers.com/topic/massachusetts-institute-of-technology

http://www.answers.com/topic/1960s-1

1.2 Basic Concepts:

A signal is a function depending on some variable with physical meaning.

Signals can be

o One-dimensional (e.g., dependent on time),

o Two-dimensional (e.g., images dependent on two co-ordinates in a plane),

o Three-dimensional (e.g., describing an object in space),

o Or higher dimensional.

1.3 Pattern recognition

Pattern recognition is a field within the area of machine learning. Alternatively, it can be defined as

"the act of taking in raw data and taking an action based on the category of the data" . As such, it is a

collection of methods for supervised learning.

Pattern recognition aims to classify data (patterns) based on either a priori knowledge or on

statistical information extracted from the patterns. The patterns to be classified are usually groups of

measurements or observations, defining points in an appropriate multidimensional space. Are to

represent, for example, color images consisting of three component colors.

1.4 Image functions:

The image can be modeled by a continuous function of two or three variables;

Arguments are co-ordinates x, y in a plane, while if images change in time a third variable t

might be added.

The image function values correspond to the brightness at image points.

The function value can express other physical quantities as well (temperature, pressure

distribution, distance from the observer, etc.).

The brightness integrates different optical quantities - using brightness as a basic quantity

allows us to avoid the description of the very complicated process of image formation.

The image on the human eye retina or on a TV camera sensor is intrinsically 2D. We shall

call such a 2D image bearing information about brightness points an intensity image.

The real world, which surrounds us, is intrinsically 3D.

The 2D intensity image is the result of a perspective projection of the 3D scene.

http://www.answers.com/topic/space

http://www.answers.com/topic/statistics-2

http://www.answers.com/topic/a-priori

http://www.answers.com/topic/pattern

http://www.answers.com/topic/data

When 3D objects are mapped into the camera plane by perspective projection a lot of

information disappears as such a transformation is not one-to-one.

Recognizing or reconstructing objects in a 3D scene from one image is an ill-posed problem.

Recovering information lost by perspective projection is only one, mainly geometric,

problem of computer vision.

The second problem is how to understand image brightness. The only information available

in an intensity image is brightness of the appropriate pixel, which is dependent on a number

of independent factors such as

o Object surface reflectance properties (given by the surface material, microstructure

and marking),

o Illumination properties,

O And object surface orientation with respect to a viewer and light source.

CHAPTER 2

2. DIGITAL IMAGE FORENSICS

Today's technology allows digital media to be altered and manipulated in ways that were impossible

twenty years ago. We are feeling the impact of this technology in nearly every corner of our lives,

from the courts to the media, politics, business, and science. As this technology continues to evolve

it will become increasingly more important for the science of digital forensics to keep pace. This

presentation will describe state of the art techniques in digital image forensics.

Digital watermarking has been proposed as a means by which an image can be authenticated. This

approach works by inserting at the time of recording an imperceptible digital code (a watermark)

into the image. With the assumption that tampering will alter a watermark, an image can be

authenticated by verifying that the extracted watermark is the same as that which was inserted. The

major drawback of this approach is that a watermark must be inserted at precisely the time of

recording, which limits this approach to specially equipped digital cameras.

In contrast, recent advances in digital forensics operate in the absence of any watermark or

specialized hardware. With the assumption that tampering disturbs certain underlying statistical

properties of an image, these forensic techniques can detect specific forms of tampering.

Air-brushing or re-touching can be detected by measuring deviations of the underlying color filter

array correlations. Specifically, virtually all digital cameras record only a subset of all the pixels

needed for a full-resolution color image. Instead, only a subset of the pixels is recorded by a color

filter array (CFA) placed atop the digital sensor. The most frequently used CFA, the Bayer array,

employs three color filters: red, green, and blue. Since only a single color sample is recorded at each

pixel location, the other two color samples must be estimated from the neighboring samples in order

to obtain a three-channel color image. The estimation of the missing color samples is referred to as

CFA interpolation or demosaicking. In its simplest form, the missing pixels are filled in by spatially

averaging the recorded values. Since the CFA is arranged in a periodic pattern, a periodic set of

pixels will be precisely correlated to their neighbors according to the CFA interpolation algorithm.

When an image is re-touched, it is likely that these correlations will be destroyed. As such, the

presence or lack of these correlations can be used to authenticate an image, or expose it as a forgery.

A digital composite of two people can be detected by measuring differences in the direction to the

illuminating light sources from their faces and body. By making some initial simplifying

assumptions about the light and the surface being illuminated, we can mathematically express how

much light a surface should receive as a function of its position relative to the light. A surface that is

directly facing the light, for example, will be brighter than a surface that is turned away from the

light. Once expressed in this form, standard techniques can be used to determine the direction to the

light source for any object or person in an image. Any inconsistencies in lighting can then be used as

evidence of tampering.

Duplication or cloning is a simple and powerful form of manipulation used to remove objects or

people from an image. This form of tampering can be detected by first partitioning an image into

small blocks. The blocks are then re-ordered so that they are placed a distance to each other that is

proportional to the differences in their pixel colors. With identical and highly similar blocks neigh-

boring each other in the re-ordered sequence, a region growing algorithm combines any significant

number of neighboring blocks that are consistent with the cloning of an image region. Since it is

statistically unlikely to find identical and spatially coherent regions in an image, their presence can

then be used as evidence of tampering.

2.1. DIGITAL WATERMARKING

A digital watermark is a kind of marker covertly embedded in a noise-tolerant signal such as audio

or image data. It is typically used to identify ownership of the copyright of such signal.

"Watermarking" is the process of hiding digital information in a carrier signal; the hidden

information should, but does not need to contain a relation to the carrier signal. Digital watermarks

may be used to verify the authenticity or integrity of the carrier signal or to show the identity of its

owners. It is prominently used for tracing copyright infringements and for banknote authentication.

Like traditional watermarks, digital watermarks are only perceptible under certain conditions, i.e.

after using some algorithm, and imperceptible anytime else. If a digital watermark distorts the carrier

signal in a way that it becomes perceivable, it is of no use. Traditional Watermarks may be applied

to visible media (like images or video), whereas in digital watermarking, the signal may be audio,

pictures, video, texts or 3D models. A signal may carry several different watermarks at the same

time. Unlike metadata that is added to the carrier signal, a digital watermark does not change the size

of the carrier signal.

The needed properties of a digital watermark depend on the use case in which it is applied. For

marking media files with copyright information, a digital watermark has to be rather robust against

modifications that can be applied to the carrier signal. Instead, if integrity has to be ensured, a fragile

watermark would be applied.

http://en.wikipedia.org/wiki/Use_case

http://en.wikipedia.org/wiki/Metadata

http://en.wikipedia.org/wiki/Watermark

http://en.wikipedia.org/wiki/Banknote

http://en.wikipedia.org/wiki/Copyright_infringement

http://en.wikipedia.org/wiki/Carrier_signal

http://en.wikipedia.org/wiki/Signal_(electrical_engineering)

Both steganography and digital watermarking employ steganographic techniques to embed data

covertly in noisy signals. But whereas steganography aims for imperceptibility to human senses,

digital watermarking tries to control the robustness as top priority.

Since a digital copy of data is the same as the original, digital watermarking is a passive protection

tool. It just marks data, but does not degrade it nor controls access to the data.

One application of digital watermarking is source tracking. A watermark is embedded into a digital

signal at each point of distribution. If a copy of the work is found later, then the watermark may be

retrieved from the copy and the source of the distribution is known. This technique reportedly has

been used to detect the source of illegally copied movies.

Digital watermarking is the process of inserting a digital signal or pattern (indicative of the owner of

the content) into digital content. The signal, known as a watermark, can be used later to identify the

owner of the work, to authenticate the content, and to trace illegal copies of the work.

Watermarks of varying degrees of obtrusiveness are added to presentation media as a guarantee of

authenticity, quality, ownership, and source.

To be effective in its purpose, a watermark should adhere to a few requirements. In particular, it

should be robust, and transparent. Robustness requires that it be able to survive any alterations or

distortions that the watermarked content may undergo, including intentional attacks to remove the

watermark, and common signal processing alterations used to make the data more efficient to store

and transmit. This is so that afterwards, the owner can still be identified. Transparency requires a

watermark to be imperceptible so that it does not affect the quality of the content, and makes

detection, and therefore removal, by pirates less possible.

The media of focus in this paper is the still image. There are a variety of image watermarking

techniques, falling into 2 main categories, depending on in which domain the watermark is

constructed: the spatial domain (producing spatial watermarks) and the frequency domain

(producing spectral watermarks). The effectiveness of a watermark is improved when the technique

exploits known properties of the human visual system. These are known as perceptually based

watermarking techniques. Within this category, the class of image-adaptive watermarks proves most

effective.

.2.1.1 Principle of digital watermarks

A watermark on a bank note has a different transparency than the rest of the note when a light is

shined on it. However, this method is useless in the digital world.

http://en.wikipedia.org/wiki/Steganography

Currently there are various techniques for embedding digital watermarks. Basically, they all digitally

write desired information directly onto images or audio data in such a manner that the images or

audio data are not damaged. Embedding a watermark should not result in a significant increase or

reduction in the original data.

Digital watermarks are added to images or audio data in such a way that they are invisible or

inaudible Ñ unidentifiable by human eye or ear. Furthermore, they can be embedded in content with

a variety of file formats. Digital watermarking is the content protection method for the multimedia

era.

2.1.2 IMPORTANCE OF DIGITAL WATERMARKS

The Internet has provided worldwide publishing opportunities to creators of various works,

including writers, photographers, musicians and artists. However, these same opportunities provide

ease of access to these works, which has resulted in pirating. It is easy to duplicate audio and visual

files, and is therefore probable that duplication on the Internet occurs without the rightful owners'

permission.

An example of an area where copyright protection needs to be enforced is in the on-line music

industry. The Recording Industry Association of America (RIAA) says that the value of illegal

copies of music that are distributed over the Internet could reach $2 billion a year.

Digital watermarking is being recognized as a way for improving this situation. RIAA reports that

"record labels see watermarking as a crucial piece of the copy protection system, whether their

music is released over the Internet or on DVD-Audio". They are of the opinion that any encryption

system can be broken, sooner or later, and that digital watermarking is needed to indicate who the

culprit is.

Another scenario in which the enforcement of copyright is needed is in newsgathering. When digital

cameras are used to snapshot an event, the images must be watermarked as they are captured. This is

so that later, image's origin and content can be verified. This suggests that there are many

applications that could require image watermarking, including Internet imaging, digital libraries,

digital cameras, medical imaging, image and video databases, surveillance imaging, video-on-

demand systems, and satellite-delivered video.

2.1.3 PURPOSES OF DIGITAL WATERMARKS

Watermarks are a way of dealing with the problems mentioned above by providing a number of

services:

They aim to mark digital data permanently and unalterably, so that the source as well as

the intended recipient of the digital work is known. Copyright owners can incorporate

identifying information into their work. That is, watermarks are used in the protection of

ownership. The presence of a watermark in a work suspected of having been copied can

prove that it has been copied.

By indicating the owner of the work, they demonstrate the quality and assure the

authenticity of the work.

With a tracking service, owners are able to find illegal copies of their work on the

Internet. In addition, because each purchaser of the data has a unique watermark

embedded in his/her copy, any unauthorized copies that s/he has distributed can be traced

back to him/her.

Watermarks can be used to identify any changes that have been made to the watermarked

data.

Some more recent techniques are able to correct the alteration as well.

2.1.4 ATTACKS ON WATERMARKS

Lossy Compression: Many compression schemes like JPEG and MPEG can potentially degrade

the data’s quality through irretrievable loss of data.

Geometric Distortions: include such operations as rotation, translation, scaling and cropping.

Common Signal Processing Operations: They include the followings.

D/A conversion, A/D conversion

Resampling, Requantization, Recompression

Linear filtering such as high pass and low pass filtering.

Addition of a constant offset to the pixel values

Local exchange of pixels

other intentional attacks:

Printing and Rescanning

Watermarking of watermarked image (rewatermarking)

2.1.5 DIGITAL WATERMARKING APPLICATIONS

Digital watermarking is rapid evolving field, this section identifies digital watermarking applications

and provides an overview of digital watermarking capabilities and useful benefits to customers. The

various applications are:

Authentication

Broadcast Monitoring

Copy Prevention

Forensic Tracking

E-Commerce/Linking

2.1.6 WATERMARKING SOFTWARE&SREVICES

Alpha-Tec: watermarking software for copyright protection and infringement

tracking.

Digimarc: For document verification, copyright protection, embedded messages and

more.

Stegnosign: For creating, embedding and detecting watermarks.

Signum: Allow digital fingerprints to be embedded into grahics, audio, video e.t.c.

MediaSec: Provide software for various media types, partial encryption, and internet

tracking.

2.2 DIGITAL SIGNATURE

A digital signature or digital signature scheme is a mathematical scheme for demonstrating the

authenticity of a digital message or document. A valid digital signature gives a recipient reason to

believe that the message was created by a known sender such that they cannot deny sending it

(authentication and non-repudiation) and that the message was not altered in transit (integrity).

Digital signatures are commonly used for software distribution, financial transactions, and in other

cases where it is important to detect forgery or tampering.

Digital signatures are often used to implement electronic signatures, a broader term that refers to any

electronic data that carries the intent of a signature,[1] but not all electronic signatures use digital

signatures.[2][3] In some countries, including the United States, India,[4] and members of the European

Union, electronic signatures have legal significance.

Digital signatures employ a type of asymmetric cryptography. For messages sent through a

nonsecure channel, a properly implemented digital signature gives the receiver reason to believe the

message was sent by the claimed sender. Digital signatures are equivalent to traditional handwritten

signatures in many respects, but properly implemented digital signatures are more difficult to forge

than the handwritten type. Digital signature schemes in the sense used here are cryptographically

based, and must be implemented properly to be effective. Digital signatures can also provide non-

repudiation, meaning that the signer cannot successfully claim they did not sign a message, while

also claiming their private key remains secret; further, some non-repudiation schemes offer a time

stamp for the digital signature, so that even if the private key is exposed, the signature is valid.

Digitally signed messages may be anything representable as a bitstring: examples include electronic

mail, contracts, or a message sent via some other cryptographic protocol.

A digital signature scheme typically consists of three algorithms:

A key generation algorithm that selects a private key uniformly at

random from a set of possible private keys. The algorithm outputs the

private key and a corresponding public key.

A signing algorithm that, given a message and a private key,

produces a signature.

A signature verifying algorithm that, given a message, public key and

a signature, either accepts or rejects the message's claim to

authenticity.

Two main properties are required. First, a signature generated from a fixed message and fixed

private key should verify the authenticity of that message by using the corresponding public key.

Secondly, it should be computationally infeasible to generate a valid signature for a party who does

not possess the private key.

2.2.1 Uses of Digital Signature

As organizations move away from paper documents with ink signatures or authenticity stamps,

digital signatures can provide added assurances of the evidence to provenance, identity, and status of

an electronic document as well as acknowledging informed consent and approval by a signatory. The

United States Government Printing Office (GPO) publishes electronic versions of the budget, public

and private laws, and congressional bills with digital signatures. Universities including Penn

State, University of Chicago, and Stanford are publishing electronic student transcripts with digital

signatures.

Below are some common reasons for applying a digital signature to communications:

2.2.1.1 Authentication

Although messages may often include information about the entity sending a message, that

information may not be accurate. Digital signatures can be used to authenticate the source of

messages. When ownership of a digital signature secret key is bound to a specific user, a valid

signature shows that the message was sent by that user. The importance of high confidence in sender

authenticity is especially obvious in a financial context. For example, suppose a bank's branch office

sends instructions to the central office requesting a change in the balance of an account. If the central

office is not convinced that such a message is truly sent from an authorized source, acting on such a

request could be a grave mistake.

2.2.1.2 Integrity

In many scenarios, the sender and receiver of a message may have a need for confidence that the

message has not been altered during transmission. Although encryption hides the contents of a

message, it may be possible to change an encrypted message without understanding it. (Some

encryption algorithms, known as nonmalleable ones, prevent this, but others do not.) However, if a

message is digitally signed, any change in the message after signature will invalidate the signature.

Furthermore, there is no efficient way to modify a message and its signature to produce a new

message with a valid signature, because this is still considered to be computationally infeasible by

most cryptographic hash functions (see collision resistance).

2.2.1.3 Non-repudiation

Non-repudiation, or more specifically non-repudiation of origin, is an important aspect of digital

signatures. By this property, an entity that has signed some information cannot at a later time deny

having signed it. Similarly, access to the public key only does not enable a fraudulent party to fake a

valid signature.

The device signature may be in the form of

sensor pattern noise (SPN)

camera response function

Re sampling artifacts

Color filter array

Interpolation artifacts

JPEG compression

Lens aberration

sensor dust

CHAPTER 3

COLOR FILTER ARRAY

3. Color filter array

The Bayer color filter mosaic. Each two-by-two submosaic contains 2 green, 1 blue and 1 red filter, each covering one pixel sensor.

In photography, a color filter array (CFA), or color filter mosaic (CFM), is a mosaic of tiny color

filters placed over the pixel sensors of an image sensor to capture color information.

Color filters are needed because the typical photosensors detect light intensity with little or no

wavelength specificity, and therefore cannot separate color information. Since sensors are made of

semiconductors they obey solid-state physics.

The color filters filter the light by wavelength range, such that the separate filtered intensities include

information about the color of light. For example, the Bayer filter (shown to the right) gives

information about the intensity of light in red, green, and blue (RGB) wavelength regions. The raw

image data captured by the image sensor is then converted to a full-color image (with intensities of

all three primary colors represented at each pixel) by a demosaicing algorithm which is tailored for

each type of color filter. The spectral transmittance of the CFA elements along with the demosaicing

algorithm jointly determine the color rendition. The sensor's passbandquantum efficiency and span

of the CFA's spectral responses are typically wider than the visible spectrum, thus all visible colors

can be distinguished. The responses of the filters do not generally correspond to the CIEcolor

matching functions, so a color translation is required to convert the tristimulus values into a

common, absolute color space.

http://en.wikipedia.org/wiki/File:Bayer_pattern_on_sensor.svg

The Foveon X3 sensor uses a different structure such that a pixel utilizes properties of multi-

junctions to stack blue, green, and red sensors on top of each other. This arrangement does not

require a demosaicing algorithm because each pixel has information about each color. Dick Merrill

of Foveon distinguishes the approaches as "vertical color filter" for the Foveon X3 versus "lateral

color filter" for the CFA.

List of color filter arrays

Image Name DescriptionPattern size

(pixels)

Bayer filterVery common RGB filter. With one blue, one red, and two

green.2×2

RGBE

filter

Bayer-like with one of the green filters modified to "emerald";

used in a few Sony cameras.2×2

CYYM

filter

One cyan, two yellow, and one magenta; used in a few cameras

of Kodak.2×2

CYGM

filter

One cyan, one yellow, one green, and one magenta; used in a

few cameras.2×2

RGBW

BayerTraditional RGBW similar to Bayer and RGBE patterns. 2×2

RGBW #1

Three example RGBW filters from Kodak, with 50% white.

(See Bayer filter#Alternatives)

4×4

RGBW #2

RGBW #3 2×4

3.1 Manufacture of the CFA

Diazonaphthoquinone (DNQ)-novolacphotoresist is one material used as the carrier for making color

filters from color dyes. There is some interference between the dyes and the ultraviolet light needed

http://en.wikipedia.org/wiki/File:Bayer_pattern.svg

http://en.wikipedia.org/wiki/File:RGBE_filter.svg

http://en.wikipedia.org/wiki/File:CYYM_pattern.svg

http://en.wikipedia.org/wiki/File:CYGM_pattern.svg

http://en.wikipedia.org/wiki/File:RGBW_Bayer.svg

http://en.wikipedia.org/wiki/File:RGBW_number_1.svg



to properly expose the polymer, though solutions have been found for this problem. Color

photoresists sometimes used include those with chemical monikers CMCR101R, CMCR101G,

CMCR101B, CMCR106R, CMCR106G, and CMCR106B.

A few sources discuss other specific chemical substances, attending optical properties, and optimal

manufacturing processes of color filter arrays.

For instance, Nakamura said that materials for on-chip color filter arrays fall into two categories:

pigment and dye. Pigment based CFAs have become the dominant option because they offer higher

heat resistance and light resistance compared to dye based CFAs. In either case, thicknesses ranging

up to 1 micrometre are readily available.

Theuwissen says "Previously, the color filter was fabricated on a separate glass plate and glued to

the CCD (Ishikawa 1981), but nowadays, all single-chip color cameras are provided with an imager

which has a color filter on-chip processed (Dillon, 1978) and not as a hybrid." He provides a

bibliography focusing on the number, types, aliasing effects, moire patterns, and spatial frequencies

of the absorptive filters.

Some sources indicate that the CFA can be manufactured separately and affixed after the sensor has

been manufactured, while other sensors have the CFA manufactured directly on the surface of the

imager. Theuwissen makes no mention of the materials utilized in CFA manufacture.

At least one early example of an on-chip design utilized gelatin filters (Aoki et al., 1982). [15] The

gelatin is sectionalized, via photolithography, and subsequently dyed. Aoki reveals that a CYWG

arrangement was used, with the G filter being an overlap of the Y and C filters.

Filter materials are manufacturer specific. Adams et al. state "Several factors influence the CFA's

design. First, the individual CFA filters are usually layers of transmissive (absorptive) organic or

pigment dyes. Ensuring that the dyes have the right mechanical properties—such as ease of

application, durability, and resistance to humidity and other atmospheric stresses—is a challenging

task. This makes it difficult, at best, to fine-tune the spectral responsivities.".

Given that the CFAs are deposited on the image sensor surface at the BEOL (back end of line, the

later stages of the integrated circuit manufacturing line), where a low-temperature regime must be

rigidly observed (due to the low melting temperature of the aluminum metalized "wires" and the

substrate mobility of the dopants implanted within the bulk silicon), organics would be preferred

over glass. On the other hand, some CVD silicon oxide processes are low temperature processes.

Ocean Optics has indicated that their patented dichroic filter CFA process (alternating thin films of

ZnS and Cryolite) can be applied to spectroscopic CCDs. Gersteltec sells photoresists that possesses

color filter properties.

3.2 Some pigment and dye molecules used in CFAs

In U.S.P.# 4,808,501, Carl Chiulli cites the use of 5 chemicals, three of which are C.I. #12715, AKA

Solvent Red 8; Solvent Yellow 88; and C.I. # 61551, Solvent Blue 36. In U.S.P. # 5,096,801 Koyaet

al., of Fuji Photo Film company, list some 150-200 chemical structures, mainly azo dyes and

pyrazolone-diazenyl, but fail to provide chemical names, CAS Registry numbers, or Colour Index

numbers.

3.3 IMAGE NOISE

Image noise is random (not present in the object imaged) variation of brightness or color

information in images, and is usually an aspect of electronic noise. It can be produced by the sensor

and circuitry of a scanner or digital camera. Image noise can also originate in film grain and in the

unavoidable shot noise of an ideal photon detector. Image noise is an undesirable by-product of

image capture that adds spurious and extraneous information.

Noise clearly visible in an image from a digital camera

The original meaning of "noise" was and remains "unwanted signal"; unwanted electrical

fluctuations in signals received by AM radios caused audible acoustic noise ("static"). By analogy

http://en.wikipedia.org/wiki/File:Highimgnoise.jpg

unwanted electrical fluctuations themselves came to be known as "noise." Image noise is, of course,

inaudible.

The magnitude of image noise can range from almost imperceptible specks on a digital photograph

taken in good light, to optical and radioastronomical images that are almost entirely noise, from

which a small amount of information can be derived by sophisticated processing (a noise level that

would be totally unacceptable in a photograph since it would be impossible to determine even what

the subject was).

3.4 Types

o Amplifier noise (Gaussian noise)

o Salt-and-pepper noise

o Shot noise

o Dark current noise

o Quantization noise (uniform noise)

o Read noise

o Anisotropic noise

3.4.1 Amplifier noise (Gaussian noise)

The standard model of amplifier noise is additive, Gaussian, independent at each pixel and

independent of the signal intensity, caused primarily by Johnson–Nyquist noise (thermal noise),

including that which comes from the reset noise of capacitors ("kTC noise"). Amplifier noise is a

major part of the "read noise" of an image sensor, that is, of the constant noise level in dark areas of

the image. In color cameras where more amplification is used in the blue color channel than in the

green or red channel, there can be more noise in the blue channel.

3.4.2 Salt-and-pepper noise

Image with salt and pepper noise

Fat-tail distributed or "impulsive" noise is sometimes called salt-and-pepper noise or spike noise. An

image containing salt-and-pepper noise will have dark pixels in bright regions and bright pixels in

dark regions. This type of noise can be caused by analog-to-digital converter errors, bit errors in

transmission, etc. It can be mostly eliminated by using dark frame subtraction and interpolating

around dark/bright pixels.

Dead pixels in an LCD monitor produce a similar, but non-random, display.

3.4.3 Shot noise

The dominant noise in the lighter parts of an image from an image sensor is typically that caused by

statistical quantum fluctuations, that is, variation in the number of photons sensed at a given

exposure level. This noise is known as photon shot noise. Shot noise has a root-mean-square value

proportional to the square root of the image intensity, and the noises at different pixels are

independent of one another. Shot noise follows a Poisson distribution, which is usually not very

different from Gaussian.

In addition to photon shot noise, there can be additional shot noise from the dark leakage current in

the image sensor; this noise is sometimes known as "dark shot noise" or "dark-current shot noise".

Dark current is greatest at "hot pixels" within the image sensor. The variable dark charge of normal

and hot pixels can be subtracted off (using "dark frame subtraction"), leaving only the shot noise, or

random component, of the leakage. If dark-frame subtraction is not done, or if the exposure time is

long enough that the hot pixel charge exceeds the linear charge capacity, the noise will be more than

just shot noise, and hot pixels appear as salt-and-pepper noise.

http://en.wikipedia.org/wiki/File:Noise_salt_and_pepper.png

http://en.wikipedia.org/wiki/File:Noise_salt_and_pepper.png

3.4.4 Dark current noise:

Dark current is the result of imperfections or impurities in the depleted bulk silicon or at the silicon-

silicon dioxide interface. These sites introduce electronic states in the forbidden gap which act as

steps between the valence and conduction bands, providing a path for valence electrons to sneak into

the conduction band, adding to the signal measured in the pixel. The efficiency of a generation

center depends on its energy level, with states near mid-band generating most of the dark current.

The generation of dark current is a thermal process wherein electrons use thermal energy to hop to

an intermediate state, from which they are emitted into the conduction band. For this reason, the

most effective way to reduce dark current is to cool the CCD, robbing electrons of the thermal

energy required to reach an intermediate state.

3.4.5 Quantization noise (uniform noise)

The noise caused by quantizing the pixels of a sensed image to a number of discrete levels is known

as quantization noise. It has an approximately uniform distribution. Though it can be signal

dependent, it will be signal independent if other noise sources are big enough to cause dithering, or if

dithering is explicitly applied.

3.4.6 Read noise

Read noise is a property that is inherent to the CCD of digital cameras, and is present in all images

taken and recorded by a camera. The read noise of a camera affects how well the image represents

the actual data, since high read noise decreases the quality of the image. Calibrating the read noise

allows us know more about the quality of the CCD as well as the data distortion due to the reading of

images.

3.4.7 Anisotropic noise

Some noise sources show up with a significant orientation in images. For example, image sensors

are sometimes subject to row noise or column noise.[13]

3.5 In digital cameras

Image on the left has exposure time of >10 seconds in low light. The image on the right has adequate

lighting and 0.1 second exposure.

In low light, correct exposure requires the use of long shutter speeds, higher gain (ISO sensitivity),

or both. On most cameras, longer shutter speeds lead to increased salt-and-pepper noise due to

photodiodeleakage currents. At the cost of a doubling of read noise variance (41% increase in read

noise standard deviation), this salt-and-pepper noise can be mostly eliminated by dark frame

subtraction. Banding noise, similar to shadow noise, can be introduced through brightening shadows

or through color-balance processing.

The relative effect of both read noise and shot noise increase as the exposure is reduced,

corresponding to increased ISO sensitivity, since fewer photons are counted (shot noise) and since

more amplification of the signal is necessary.

3.6 Effects of sensor size

The size of the image sensor, or effective light collection area per pixel sensor, is the largest

determinant of signal levels that determine signal-to-noise ratio and hence apparent noise levels,

assuming the aperture area is proportional to sensor area, or that the f-number or focal-plane

illuminance is held constant. That is, for a constant f-number, the sensitivity of an imager scales

roughly with the sensor area, so larger sensors typically create lower noise images than smaller

sensors. In the case of images bright enough to be in the shot noise limited regime, when the image

is scaled to the same size on screen, or printed at the same size, the pixel count makes little

difference to perceptible noise levels – the noise depends primarily on sensor area, not how this area

is divided into pixels. For images at lower signal levels (higher ISO settings), where read noise

(noise floor) is significant, more pixels within a given sensor area will make the image noisier if the

per pixel read noise is the same.

http://en.wikipedia.org/wiki/File:Noise_Comparison.JPG

For instance, the noise level produced by a Four Thirds sensor at ISO 800 is roughly equivalent to

that produced by a full frame sensor (with roughly four times the area) at ISO 3200, and that

produced by a 1/2.5" compact camera sensor (with roughly 1/16 the area) at ISO 100. This ability to

produce acceptable images at higher sensitivities is a major factor driving the adoption of DSLR

cameras, which tend to use larger sensors than compacts. An example shows a DSLR sensor at ISO

400 creating less noise than a point-and-shoot sensor at ISO 100.

3.7 Sensor heat

Temperature can also have an effect on the amount of noise produced by an image sensor due to

leakage. With this in mind, it is known that DSLRs will produce more noise during summer than

winter.

3.8 Image noise reduction

Most algorithms for converting image sensor data to an image, whether in-camera or on a computer,

involve some form of noise reduction. There are many procedures for this, but all attempt to

determine whether the actual differences in pixel values constitute noise or real photographic detail,

and average out the former while attempting to preserve the latter. However, no algorithm can make

this judgment perfectly, so there is often a tradeoff made between noise removal and preservation of

fine, low-contrast detail that may have characteristics similar to noise. Many cameras have settings

to control the aggressiveness of the in-camera noise reduction.

A simplified example of the impossibility of unambiguous noise reduction: an area of uniform red in

an image might have a very small black part. If this is a single pixel, it is likely (but not certain) to be

spurious and noise; if it covers a few pixels in an absolutely regular shape, it may be a defect in a

group of pixels in the image-taking sensor (spurious and unwanted, but not strictly noise); if it is

irregular, it may be more likely to be a true feature of the image. But a definitive answer is not

available.

This decision can be assisted by knowing the characteristics of the source image and of human

vision. Most noise reduction algorithms perform much more aggressive chroma noise reduction,

since there is little important fine chroma detail that one risks losing. Furthermore, many people find

luminance noise less objectionable to the eye, since its textured appearance mimics the appearance

of film grain.

The high sensitivity image quality of a given camera (or RAW development workflow) may depend

greatly on the quality of the algorithm used for noise reduction. Since noise levels increase as ISO

sensitivity is increased, most camera manufacturers increase the noise reduction aggressiveness

automatically at higher sensitivities. This leads to a breakdown of image quality at higher

sensitivities in two ways: noise levels increase and fine detail is smoothed out by the more

aggressive noise reduction.

In cases of extreme noise, such as astronomical images of very distant objects, it is not so much a

matter of noise reduction as of extracting a little information buried in a lot of noise; techniques are

different, seeking small regularities in massively random data.

CHAPTER 4

FIXED PATTERN NOISE

Fixed pattern noise is the term given to a particular noise pattern on digital imaging sensors often

noticeable during longer exposure shots where particular pixels are susceptible to giving brighter

intensities above the general background noise.

Fixed pattern noise (FPN) is a general term that identifies a temporally constant lateral non-

uniformity (forming a constant pattern) in an imaging system with multiple detector or picture

elements (pixels). It is characterized by the same pattern of 'hot' (brighter) and cold (darker) pixels

occurring with images taken under the same illumination conditions in an imaging array. This

problem arises from small differences in the individual responsibility of the sensor array (including

any local post amplification stages) that might be caused by variations in the pixel size, material or

interference with the local circuitry. It might be affected by changes in the environment like different

temperatures, exposure times, etc.

The term "fixed pattern noise" usually refers to two parameters.[1] One is the DSNU (dark signal

non-uniformity), which is the offset from the average across the imaging array at a particular setting

(temperature, integration time) but no external illumination and the PRNU (photo response non-

uniformity), which describes the gain or ratio between optical power on a pixel versus the electrical

signal output. The latter can be described as the local, pixel dependent photo response non-

linearity (PRNL) and is often simplified as a single value measured at almost saturation level to

permit a linear approximation of the non-linear pixel response. Sometimes pixel noise [2] as the

average deviation from the array average under different illumination and temperature conditions is

specified. Pixel noise therefore gives a number (commonly expressed in rms) that identifies FPN in

all permitted imaging conditions, which might strongly deteriorate if additional electrical gain (and

noise) is included.

In practice, a long exposure (integration time) emphasizes the inherent differences in pixel response

so they may become a visible defect, degrading the image. Although FPN does not change

appreciably across a series of captures, it may vary with integration time, imager temperature,

imager gain and incident illumination, it is not expressed in a random (uncorrelated or changing)

spatial distribution, occurring only at certain, and fixed pixel locations.

One of the few engineering definitions for PRNU or "photoresponsenonuniformity" is in the

photonics dictionary. And it is for CCD only.

4.1 PRNU (Photo Response Non-Uniformity)

4.1.1 Background

Photo Response Non-Uniformity, or PRNU for short, is one source of pattern noise in digital

cameras. Like DSNU, it is seen as the variation in pixel responsively over the CCD. However, while

DSNU occurs as a variation in pixel responsively when the CCD is not illuminated, PRNU is the

pixel variation under illumination.

4.1.2 Methods

To characterize the PRNU, we use the camera to take multiple images of a uniform scene, produced

by the Optoliner. We kept the illumination level fixed at 3.00 candelas since the brighter light is

more easily detected by the camera, and we also checked to ensure that the camera is focused before

taking the pictures. We took 100 exposures each for three exposure times: 1/10, 1/4 and 1/2.5.

The calculation of the PRNU is as follows:

Obtain the average image over the 100 images taken:

Subtract the DSNU image from this average image to eliminate the contribution from

the DSNU.

Obtain the spatial variance of the pixel values over the entire CCD

Divide the spatial variance by the average image from (ii) to obtain the PRNU as a

percentage of the actual pixel values.

Repeat the calculations for the different exposure times to compare the PRNU.

We expect the PRNU to increase with increasing illumination, since increasing the illumination level

will enhance the difference in the photo-response of the pixels across the image and lead to a higher

PRNU. In our measurements, since the maximum value of the Opt linear device is around 4

candelas, and increasing the illumination level increases the non-uniformity of the illumination

produced by the Opt linear, we chose to increase the exposure times to mimic the effect of increasing

illumination levels.

The dominating component of sensor pattern noise is photo response non-uniformity (PRNU).

However, the PRNU can be contaminated by various types of noise introduced at different stages of

the image acquisition process. Figure 1 demonstrates the image acquisition process. A colour photo

is represented in three colour components (i.e., R, G, and B). For most digital cameras, during the

image acquisition process, the lenses let through the rays of the three colour components of the

scene, but for every pixel only therays of one colour component is passed through the CFA and

subsequently converted into electronic signals by the sensor. This colour filtering is determined by

the CFA. After the conversion, a colour interpolation function generates the electronic signals of the

other two colour components for every pixel according to the colour intensities of the neighboring

pixels. This colour interpolation process is commonly known as demosaicking. The signals then

undergo additional signal processing such as white balance, gamma correction and image

enhancement. Finally, these signals are stored in the camera’s memory in a customized format,

primarily the JPEG format.

In acquiring an image, the signal will inevitably be distorted when passing through each process and

these distortions result in slight differences between the scene and the camera-captured image. As

formulated in [11], a camera output model can be expressed as

where I is the output image, and is the input signal of the scene, g is the colour channel gain, (=

0.455) is the gamma correction factor, K is the zero-mean multiplicative factor responsible for the

PRNU, and , stand for dark current, shot noise, read-out noise and quantization (lossy

Compression) noise, respectively. In Eq. (1),s andr are random noise and is the fixed pattern

noise (FPN) that is associated with every camera and can be removed by subtracting a dark frame

from the image taken by the same camera. Since is the dominating term in Eq. (1), after applying

Taylor expansion to Eq. (1) and keeping the first two terms of the expansion

where is the denoised image and is the ensemble of the noises, including , .

The PRNU pattern noise K can then be formulated as

is the noise residual obtained by applying a denoising filter on image I. Although various denoising

filters can be used, the wavelet-based denoising process (i.e., the discrete wavelet transform

followed by a Wiener filtering operation), has been reported as effective in producing good results.

4.2 Use of PRNU in Device Identification

The basic idea of using the PRNU noise pattern in device identification can be described as follows.

1) First, for each imaging device d, the noise residual patterns are extracted using Eq. (5) from a

number of low-contrast images taken by device d and then the PRNU is estimated using the

ML estimation procedure adopted by Chen et. al., i.e.,

where S is the number of images involved in the calculation, is the gamma correction factor

,is the s-th image taken by device d and is the noise residual extracted from .

Note the multiplication operation in Eq. (5) is element-wise.

2) Secondly, the noise residual WI of image I under investigation is extracted using Eq. (5) and

compared against the reference PRNU Kd of each device d available to the investigator in the hope

that it will match one of the reference fingerprints, thus identifying the source device that has taken

the image under investigation. The normalised cross-correlation

is used to compare the noise against the reference fingerprint , where is the mean function.

Note in Eq. (6), instead of using , we used as suggested in [11]. Again the multiplication

operation in Eq. (6) is element-wise.

Given the PRNU-based approaches‟ potential in resolving device identification problem to the

accuracy at individual device level, it is important that the PRNU extracted is as close to the genuine

pattern noise due to the sensor as possible. Since for most cameras, only one of the three colours of

each pixel is physically captured by the sensor while the other two are artificially interpolated by the

demosaicking process, this inevitably introduce noise with power stronger than that of the genuine

PRNU. We can see from Eq. (2), (3) and (4) that the accuracy of both PRNU K and noise residual W

depends on the denoising operation applied to I in obtaining . However, as mentioned earlier that

the most common method of obtaining I is to apply the discrete wavelet transform followed by a

Wiener filtering operation directly to the entire image I without differentiating physical components

from artificial components and, as a result, allowing the interpolation noise in the artificial

components to contaminate the real PRNU in the physical components. Addressing this shortcoming

is the motivation of this work. In this work, we will look at the impact of demosaicking on PRNU

fidelity in Section II and propose an improved formula for extracting PRNU in Section III. In

Section IV, we present some experiments on device identification and image content integrity

verification to validate the proposed PRNU extractionformula. Section V concludes this work.

Because the PRNU is formulated in Eq. (3) and (5) as a function of the noise residual W (i.e., Eq.

(4)), in the rest of the work we will use the two terms, PRNU and noise residual, interchangeably

whenever there is no need to differentiate them.

4.3 DEMOSAICING

A demosaicing (also de-mosaicing or demosaicking) algorithm is a digital image process used to

reconstruct a full color image from the incomplete color samples output from an image

sensor overlaid with a color filter array (CFA). It is also known as CFA interpolation or color

reconstruction.

Most modern digital cameras acquire images using a single image sensor overlaid with a CFA, so

demosaicing is part of the processing pipeline required to render these images into a viewable

format.

Many modern digital cameras can save images in a raw format allowing the user to demosaic it

using software, rather than using the camera's built-in firmware.

The aim of a demosaicing algorithm is to reconstruct a full color image (i.e. a full set of color triples)

from the spatially under sampled color channels output from the CFA. The algorithm should have

the following traits:

Avoidance of the introduction of false color artifacts, such as

chromatic aliases, zippering (abrupt unnatural changes of intensity

over a number of neighboring pixels) and purple fringing

Maximum preservation of the image resolution

Low computational complexity for fast processing or efficient in-

camera hardware implementation

Amenability to analysis for accurate noise reduction

To reconstruct a full color image from the data collected by the color filtering array, a form

of interpolation is needed to fill in the blanks. The mathematics here is subject to individual

implementation, and is called demosaicing.

4.4 DEMOSAICKING IMPACT ON PRNU FIDELITY

In this work, we call the colour components physically captured by the sensor as physical colours

and the ones artificially interpolated by the demosaicking function as artificial colours. Due to the

fact that demosaicking is a key deterministic process that affects the quality of colour images taken

by many digital devices, demosaicking has been rigorously investigated. Most demosaicking

approaches group the missing colours before applying an interpolation function. The grouping

process is usually content-dependent, e.g., edge-adaptive or non-adaptive, hence the accuracy of

colour interpolation result is also content-dependent. For example, in a homogeneous area, because

of the low variation of the colour intensities of neighbouring pixels, the interpolation function can

more accurately generate artificial components. Conversely, in inhomogeneous areas, the colour

variation between neighbouring pixels is greater, thus the interpolation noise is also more significant.

This indicates that the PRNU in physical colour components is more reliable than that in the

artificial components. However, the existing method for extracting PRNU as formulated in Eq. (4)

and (5) based on the definition of the output image model in Eq. (1) does not take this into account.

To extract the PRNU using Eq. (4) and (5), the discrete wavelet transform followed by a Wiener

filtering operation is applied. The main problem inherent to Eq. (4) is that it involves the whole

image plane, which contains both artificial and physical components, in one noise residual extraction

process. However, each coefficient of the wavelet transform used in the noise residual extraction

process involves multiple pixels and thus both artificial and physical components. As a result the

interpolation noise gets diffused from the artificial components into the physical ones. For example,

in the red colour component/plane of an image taken by a camera with a Bayer CFA, only one fourth

of the pixels‟ red colour are physical and for each pixel with physical red colour all its 8-

neighbours‟ red colours are artificial. When wavelet transform is applied during the noise residual

extraction process the interpolation noise residing in the artificial components propagates into the

physical components. Therefore it is desirable to devise a noise residual extraction method that can

prevent the artificial components from contaminating the reliable PRNU residing in the physical

components with the interpolation noise.

CHAPTER 5

CD-PRNU (Color Decoupled Photo Response Non-Uniformity)

5.1 FORMULATION OF COLOUR DECOUPLED PRNU (CD-PRNU)

In this section, we will discuss the formulation and extraction of CD-PRNU. First, a mathematical

model for the CD-PRNUis derived and then an extraction algorithm is proposed to extract the noise

residual that is to be used for estimating the final CD-PRNU, without prior knowledge about the

CFA.

5.2 Mathematical Model of CD-PRNU

A generic demosaicking process is to convolve an interpolation matrix with an image block of the

same size centred at the pixel where the artificial colour is to be calculated. Although the 2×2 Bayer

CFA is the most common CFA pattern, to make the proposed CD-PRNU versatile and applicable to

cameras adopting different CFA patterns, we makes no assumption about the CFA pattern, F, except

that it is a 2 × 2 square array. Let be an interpolation matrix with 2N+1 × 2N+1 coefficients and

be a X × Y-pixel input signal from the scene consisting of three colour

components, R (red), G (green) and B (blue) before colour interpolation. That is to say that for each

pixel , only one of the three colour components takes a value physically captured by the

sensor and this colour is determined by the colour configuration of the CFA pattern F. The other two

colour components are to be determined by the demosaicking process. For each colour component of

a pixel , can be determined according to

The first part of Eq. (7) means that if the colour component c is the same as the colour that the CFA

pattern F allows to pass, i.e , then no demosaicking is needed because c has

been physically captured by the sensor. Otherwise, the second part of Eq. (7) is artificially applied to

calculate the colour. According to Eq. (7), the image output model of Eq. (1) proposed in can be re-

formulated as

Eq. (9) suggests that in the artificial components, the PRNU is actually the interpolation noise P

while, in the physicalcomponents, the PRNU remains unaffected by the interpolation noise.

It can also be seen from Eq. (9) that the physical components and artificial components have similar

mathematical expression. Hence if the physical and artificial colour components can be separated /

decoupled, P can be extracted in the same way as the sensor pattern noise K is extracted (i.e., Eq.

(3)). That is

where is a low-passed filtered version of the artificial components and is the corresponding

“sensor pattern noise”, which is actually the interpolation noise. We can also use the same ML

estimate as in Eq. (5) to extract the reference interpolation noise for a particular device d from S

low-variation images taken by d such that

where is the artificial colour components of the s-th low-contrast image taken by device d and

is the interpolation noise extracted from . We will discuss how the physical and artificial

colour components can be decoupled in simple manner without a priori knowledge about the CFA

pattern in Section III.B.

5.3 CD-PRNU Extraction Algorithm

According to Eq. (10) and (11), we can extract the sensor pattern noise and interpolation noise,

respectively, from the physical and artificial components if the CFA is known. However,

manufacturers usually do not provide information about the CFA used by their cameras. Therefore,

several methods have been proposed to estimate the CFA. Unfortunately, these methods have to

exhaust all of the possible CFA patterns in order to infer/estimate the „real‟/optimal CFA. However,

exhaustive search is by no means acceptable. In this work, to extract the CD-PRNU, we first

separate the three colour channels of a colour image I of pixels. Most CFA

patterns are of 2 × 2 elements and are periodically mapped to the sensors. We know that, for each

pixel of I, only one of the three colour components is physical and the other two are artificial, so the

second step is, for each channel , we perform a 2:1 down-sampling across both horizontal and

vertical dimensions to get four sub-images, , such that

For each colour channel, , without knowing the CFA pattern used by the manufacturer, we do not

know (actually we do not have to know) which pixels carry the colour captured physically by the

hardware and which are not. But by decomposing into four sub-images, , we know that each

of the four sub-images either contains only the physical colour or only the artificial colours. By de-

coupling the physical and artificial colour components in this fashion before extracting the noise

residual, we can prevent the artificial components from contaminating the physical components

during the DWT process. Eq. (4) is then used to obtain noise residual from each sub-images

. Finally the CD-PRNU Wc of each colour channel c is formed by combining the

four sub-noise residuals such that

where, and mod is the modulo operation. The framework of the colour decoupled

noise residual extraction process is shown in Figure 2 and the procedures are listed in Algorithm 1.

Note that Algorithm 1 is for extracting the noise residual pattern W from an image I. To estimate the

CD-PRNU Pd of a particular device d and use it as the reference signature of d, Eq. (11) is applied.

5.4 Algorithm 1. Noise residual extraction algorithm

Input: original image I

Output: colour decoupled noise residual W

Noise residual extraction algorithm

5.5 EXPERIMENTAL RESULTS

In this section, we carry out experiments on source camera identification and image content integrity

verification to validate the feasibility of the proposed CD-PRNU in a comparative manner.

5.5.1. Source Camera Identification

We have carried out source camera identification tests on 300 2048×1536-pixel photos of natural

scenes taken by six cameras(C1 to C6), each responsible for 50. The six cameras are listed in Table1.

Table 1. Cameras used in the experiments.

The reference PRNU (i.e. ) of each camera Ci is generated by taking the weighted average

of the PRNUs extracted from 30 photos of blue sky according to Eq. (11). For device identification

purpose, we need clean PRNUs (which appear as high frequency bands of images) as device

fingerprints for comparison against the PRNU extracted from individual images under investigation.

The reason blue-sky images are chosen in this work is because blue sky contains less scene details

(high frequency signal), thus giving better chance of extracting clean PRNU. Actually, other images

with low-variation scenes (i.e., scenes without significant details) can be used instead. Taking the

average of the PRNUs from 30 blue sky images is to further reduce variation. Our empirical

experience suggests that an average of 20 blue sky images is accurate enough.

Source camera identification requires similarity comparisons among PRNUs (CD-PRNUs) and

therefore the feasibility of the chosen similarity metrics is important. Fridrich suggested the use of

the Peak to Correlation Energy (PCE) measure in [15], which has been proved to be a more stable

detection statistics than normalised cross-correlation when applied to the scenarios in which the

images of interest may have undergone geometrical manipulations, such as rotation or scaling. The

purpose of this experiment is to demonstrate the capability of the proposed CD-PRNU in dealing

with the colour interpolation noise, so geometrical transformations will not be applied in order to

prevent biased evaluation from happening. Therefore, in the following experiments, normalised

cross-correlation formulated as in Eq. (6) will be used to measure the similarity between PRNUs

(CD-PRNUs).

In practice, the normalised cross-correlation has to be greater than a specified threshold for a camera

to be identified as the source camera. However, in this experiment, the key point is about

demonstrating the different performance of the traditional PRNU and the proposed CD-PRNU.

Therefore, a camera is identified as the source camera, if out of the six reference PRNUs (or CD-

PRNUs), its reference PRNU (or CD-PRNU) is most similar to the PRNU (or CD-PRNU), WI, of

the image I under investigation.

Because PRNU is often used in content integrity verification, where smaller image blocks have to be

analysed, we also compare the performance of the proposed CD-PRNU against that of the traditional

PRNU [11] when they are applied to blocks of 5 different sizes cropped from the centre of the full-

sized PRNU (CD-PRNU). Table 2 lists the identification rates. Individually speaking, C1, C3, C4,

C5 and C6 perform significantly better when CD-PRNU is used in all cases, except for a few cases

when images are of full size (1536 × 2048 pixels) and the identification rates are close or equal to

100% (1.0000). For C2, PRNU performs equally well as CD-PRNU when the image size is 192 ×

256 pixels and slightly outperforms CD-PRNU when the block size is 48 × 64 pixels. We suspect

that the reason C2 does not perform as expected is because the CFA pattern is not a 2 × 2 square

array as we have assumed. Another reason is that, because the smaller the images, the less data is

available, therefore identification results become less reliable. Generally speaking,

when the statistics of the six cameras are pooled together, as listed in the Total column of Table 2,

we can see that CD-PRNU still outperforms PRNU significantly. This has been graphically

presented in Figure 3(a).

Figure 3. Performance comparison of source camera identification a) Overall identification rates

when CD-PRNU and PRNU are used as fingerprint

In Figure 3(b), a ROC curve of the performance of PRNU and CD-PRNU are demonstrated. We can

see that the CD-PRNU outperforms the PRNU because at all fixed False Positive rate the CD-

PRNU‟s True Positive rate are always higher than that of the PRNU.

Figure 3. Performance comparison of source camera identification b) Overall ROC curve when CD-

PRNU and PRNU are used as fingerprint

For a system with a Pentium Core II 1.3G CPU and 3 GB RAM, it takes 0.526 seconds to compute

the similarity between the PRNUs of two images of 2048 × 1536 pixels and 0.567 seconds to

calculate the similarity between a pair of CD-PRNUs of the same size. The amount of data processed

during the extraction of PRNU and CD-PRNU is the same. Although extracting CD-PRNU requires

down-sampling and up-sampling, these two operations are trivial and only incur negligible increase

of time complexity.

Table 2. Source camera identification rates using traditional PRNU and proposed CD-PRNU.

5.5.2 Content Integrity Verification

We also carried out the following three content integrity verification experiments on 640 × 480-pixel

images.

In the first experiment, we copied a 160 × 390-pixel area from Image I.1 in Figure 4(a), and

pasted it at approximately the same location in Image I.2 in Figure 4(b) to create the forged

Image I.3 as shown in Figure 4(c). The images in Figure 4(a) and (b) are taken by Olympus

C730.

Figure 4. The original image, source image and forged images for the content verification

experiments. (a) Original Image I.1 (b) Original Image I.2 (c) Forged Image I.3

In the second experiment, we cropped an 80 × 100-pixel area from Image II.1 in Figure 5(a),

which covers the face of the person, pasted it at the area where the face of another person is

in Image II.2 in Figure 5(b) to create the forged Image II.3 in Figure 5(c). The images in

Figure 5(a) and (b) are also taken by the same camera.


experiments. (a) Original Image II.1 (b) Original Image II.2 (c) Forged Image II.3

In the third experiment, we cropped a 60 × 80-pixel area from Image III.1 in Figure 6(a)

taken by Canon Power Shot A400, which covers the face of the person, pasted it at the area

where the face of another person is in Image III.2 in Figure 6(b), which is taken by Olympus

C730, to create the forged Image III.3 in Figure 6(c).


experiments. (a) Original Image III.1 (b) Original Image III.2 (c) Forged Image III.3

To detect the manipulated areas, we slid a 128 × 128-pixel window across the PRNU extracted from

the image under investigation and another window of the same size across the reference PRNU of

the cameras that have taken images I.2, II.2 and III.2. In Chen’s method [11], the windows are

moved a pixel at a time, which incurs a high computational load. Moreover, this method is not

accurate at the pixel level [11]. Therefore, in our experiment, the sliding step/displacement is set to 5

pixels in order to reduce the computational load without sacrificing the accuracy of the integrity

verification. Table 3 lists the number of manipulated and non-manipulated blocks of 5 × 5 pixels in

the forged images.

Table 3. Number of manipulated and non-manipulated areas in each image (unit: block).

To decide whether a block centered at the window superposed on the image has been manipulated or

not, the cross-correlation of the PRNU patterns inside the two windows at the same location was

calculated according to Eq. (6). If the cross-correlation is lower than a predetermined threshold t, the

block in the centre of the window is deemed as manipulated. As discussed in [11], the cross-follows

the Generalized Gaussian (GG) distribution, therefore, we use various thresholds defined as to

analyze the performance of PRNU and CD-PRNU, where and are the mean and standard deviation

of the correlations distribution, respectively, and T(t) is the threshold. By varying the value of t, we

can evaluate the integrity verification performance across a wide range of correlation thresholds T(t).

In the following experiments we will allow t to vary independently in the range from 0.0 to 3.0 and

use the four metrics, true positive (TP), false positive (FP), true negative (TN) and false negative

(FN) to measure the performance of integrity verifications based on PRNU and CD-PRNU. As t

grows, we will obtain lower TP and FP, while higher TN and FN. Let B be an arbitrary block and

M(B) and Md(B) be defined as

TP, FP, TN and FN are defined as TP = |{B | M(B) = 1 and Md(B) = 1}|, TN = |{B | M(B) = 0 and

Md(B) = 0}|, FP = |{B | M(B) = 0 and Md(B) = 1}| and FN = |{B | M(B) = 1 and Md(B) = 0}|. Higher

TP and TN, and lower FP and FN indicate better performance.

According to Chen‟s predication, “the block dimensions impose a lower bound on the size of

tampered regions that our algorithm can identify. Thus, we remove all simply connected tampered

regions from Z that contain less than 64×64 pixels (one quarter of the number of pixels in the

block)”. Chen applies erosion and dilation operations with a square kernel in order to filter small

areas identified as tampered with. The final authentication result is a image with the dilated areas

highlighted as the tampered areas. However, the performance of the filtering / dilation operation

strongly depends on parameter setting and hence many experiments must be run to obtain the best

parameters for filtering. In order to simplify the comparison and to obtain a fair result, we use the

raw data without any filtering to calculate the TP, TN, FP and FN. As a result, the experiments on

III.3 demonstrate that CD-PRNU-based method significantly outperforms the PRNU-based method

when the tampered area is about one quarter of the sliding window.

5.5.2.1 Experiment on Image I.3

Figure 7 shows the performance of the PRNU and CD-PRNU in terms of TP, TN, FP and FN when

authentication is carried out on image I.3 across a range of correlation threshold T(t). We can see

from Figure 7(a) and 7(b) that CD-PRNU generally achieves higher TP and TN while maintaining

lower FP and FN. A lower correlation (similarity) allows the algorithm to detect more manipulated

blocks, leading to higher TP. However, a low threshold also results in the situation where more

authentic blocks are mistakenly detected as manipulated, giving rise to a higher FP. Therefore a

ROCcurve of TP rate with respect to FP rate can be used to evaluate the overall performance of the

PRNU and CD-PRNU. Let α be the number of manipulated blocks and β be the number of authentic

blocks, the ROC is formulated as

At the same false positive rate , which is marked along the horizontal axis of the ROC curve,

an algorithm with better performance will have a higher true positive rate (), which is marked

vertically. The ROC curves for the integrity verification experiments on image I.3 is illustrated as

Figure 8. It is clear that the ROC curve of the PRNU-based scheme mostly overlaps with that of

Random Guess, which means the authentication result is generally as unreliable as that of a random

guess. This is because the area we copied from the source image I.1 is at approximately the same

location as the original area in image I.2; therefore the PRNU pattern noises in the two areas are

almost the same. As a result, the scheme cannot detect the manipulated area based on PRNU. By

contrast, the CD-PRNU-based scheme results in a curve much higher than the PRNU-based method,

which means that by using CD-PRNU manipulated blocks can be detected more reliably.

Figure 7. Authentication results on image I.3: Integrity verification performance of the PRNU and

CD-PRNU in terms of a) TP, b) TN, across a range of correlation threshold T(t), with t varying from

0.0 to 3.0.

Figure 8. The ROC curve of Truth Positive Rate with respect to False Positive Rate of PRNU and

CD-PRNU when authentication is performed on image I.3.

5.5.2.2 Experiment on Image II.3

When verifying the integrity of image II.3, CD-PRNU‟s consistently higher TP and lower FN, as

shown in Figure 9(a) and 9(d), again indicate its superiority to PRNU. However, mixed performance

in terms of TN and FP can be seen in Figure 9(b) and 9(c). Albeit their mixed performance in terms

of TN and FP, both PRNU and CD-PRNU can effectively detect the manipulated blocks as their

ROC curves have suggested in Figure 10. Figure 10 also shows that the ROC curve of CD-PRNU is

still slightly higher than that of PRNU, indicating a slightly better performance of CD-PRNU.

Figure 9. Authentication results on image II.3: Integrity verification performance of the PRNU and

CD-PRNU in terms of a) TP, b) TN, c) FP and d) FN across a range of correlation threshold T(t),

with t varying from 0.0 to 3.0.

Figure 10. The ROC curve of Truth Positive Rate with respect to False Positive Rate of PRNU and

CD-PRNU when authentication is performed on image II.3.

5.5.2.3 Experiment on Image III.3

When authenticating III.3, although the performance of PRNU and CD-PRNU in terms of TN and

FP are mixed, as can be seen in Figure 11(b) and 11(c), CD-PRNU‟s significantly better

performance in terms of TP and lower FN can still be seen again in Figure 11(a) and 11(d),

respectively. When the threshold t is higher than 1.1, the PRNU cannot correctly detect any

manipulated blocks (i.e. as demonstrated in Figure 11(a). This poor performance is also

reflected in the PRNU’s ROC curve in Figure 12 and is due to the fact that he manipulated area is

too small (60 × 80 pixels), which is only about one quarter of the sliding window (128 × 128 pixels).

Chen predicated in that one quarter of the sliding window is the lower bound on the size of tampered

regions that our algorithm can identify, and therefore areas smaller than this should be filtered in

order to remove the falsely identified noise. The experiment result on III.3 conforms to Chen’s

observation. Since the tampered area is 60 × 80 pixels, approximately one quarter of the window, the

method based on PRNU can perform no better than a random guess. By contrast, the manipulated

blocks can be effectively detected by the CD-PRNU-based scheme because the areas in question are

from two images taken by different cameras and thus contain different interpolation noise. As a

result, the CD-PRNU-based method can identify smaller areas.

Figure 11. Authentication results on image III.3: Integrity verification performance of the PRNU and

CD-PRNU in terms of a) TP, b) TN, c) FP and d) FN across a range of correlation threshold T(t),

with t varying from 0.0 to 3.0.

Documents

Project Report