Digital Image Processing - CAUmi.cau.ac.kr/teaching/lecture_aai/DIP.pdfImage processing is a method to convert an image into digital form and perform some operations on it, in order

Advanced Artificial IntelligenceC

reativ

e Desig

n | C

hu

ng

-An

g U

niv

ersity | N

arra

tion

: Pro

f. Ja

esun

g L

ee

Digital Image Processing

Presenter: Nguyen The Vi

Good afternoon Professor and everyone. Today I am delighted to be here to talk about Digital Image processing (DIP).

A digital image is a representation of a two-dimensional image as a finite set of digital values, called picture elements or pixels. Pixel values typically represent gray levels, colors, heights, opacities, etc. And we should remember that digitalization implies that a digital image is an approximation of a real scene.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

2

What is a digital image?

Image processing is a method to convert an image into digital form and perform some operations on it, in order to get an enhanced image or to extract some useful information from it. It is a type of signal dispensation in which input is image, like video frame or photograph and output may be image or characteristics associated with that image.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

3

What is digital image processing?

Digital image processing focuses on two major tasks, which are improvement of pictorial information for human interpretation and processing of image data storage, transmission and representation for autonomous machine perception.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

4

What is digital image processing?

One of the most common uses of the digital image processing techniques including improve quality, remove remove noise, etc. Major uses of imaging based on X-rays include medicine and astronomical observations. X-rays are among the oldest sources of radiation used for imaging. The best known use of X-rays is medical diagnostics, but they also are used extensively in industry and other areas, like astronomy.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

5

Examples

Another major area of visual processing is remote sensing, which usually includes several bands in the visual and infrared regions of the spectrum.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

6

Examples

One another application is artistic effects in movies which are used to make images more visually appealing and to add a special effects to make composite images.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

7

Examples

Others applications of digital image processing in visual spectrum include automated counting and, in law enforcement,

the reading of the serial number for the purpose of tracking and identifying bills.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

8

Examples

One another application of DIP is face recognition and gesture recognition.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

9

Examples

Now we discuss about the key states in DIP.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

10

Key states in DIP

First step is image acquisition, it could be as simple as being given an image that is already in digital form.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

11

Key states in DIP

Second step is Image Enhancement. This is the process of manipulating an image so that the result is more suitable than the original for a specific application. The word specific is important here, because it establishes at the outset that

enhancement techniques are problem oriented. Thus, for example, a method that is quite useful for enhancing X-ray images may not be the best approach for enhancing satellite images taken in the infrared band of the electromagnetic spectrum

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

12

Key states in DIP

Objective of image enhancement – process the image (e.g. contrast improvement, image sharpening ,…) so that it is better suited for further processing or analysis. Image enhancement methods are based on subjective image quality criteria. It means that there is no objective mathematical criteria are used for optimizing processing results.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

13

Image enhancement

There are some methods for solving enhancement problem including: Point processing, Spatial filtering, and Image colouring.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

14

Image enhancement

Contrast enhancements improve the perceptibility of objects in the scene by enhancing the brightness difference between objects and their backgrounds. Contrast enhancements are typically performed as a contrast stretch followed by a tonal enhancement, although these could both be performed in one step. Most contrast enhancement methods make use of the gray-level histogram

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

15

Image enhancement

Most contrast enhancement methods make use of the gray-level histogram, created by counting the number of times each gray-level value occurs in the image, then dividing by the total number of pixels in the image to create a distribution of the percentage of each gray level in the image. The gray-level histogram describes the statistical distribution of the gray levels in the image but contains no spatial information about the image.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

16

Image enhancement

Spatial filtering refers to image operator that change the gray value at any pixel (x,y) depending on the pixel

in a square neighborhood centered at (x,y) using a fixed integer matrix of the same size. The integer matrix is

called a filter, mask, kernel or a window.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

17

Spatial filtering

The concept of filtering has its roots in the use of the Fourier transform for signal processing in the so-called frequency domain. Spatial filtering term is the filtering operations that are performed directly on the pixels of an image. The process consists simply of moving the filter mask from point to point in an image.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

18

Spatial filtering

The mechanism of spatial filtering consists of moving the filter mask from pixel to pixel in an image. At each pixel (x,y) the response of the filter at that pixel is calculated using a predefined relationship ( linear or nonlinear )

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

19

Spatial filtering

We consider linear spatial filtering, which we call convolution. It is the process that consists of moving the filter mask from pixel to pixel in an image. At each pixel (x,y), the response is given by a sum of products of the filter coefficients and the corresponding image pixels in the area spanned by the filter mask.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

20

Spatial filtering

Next, we consider nonlinear spatial filtering. The operation also consists of moving the filter mask from pixel to pixel in an image. The filtering operation is based conditionally on the values on the pixel in the neighborhood, and they do not explicitly use coefficients in the sum-of-products manner. For example,noise reduction can be achieved effectively with a nonlinear filter whose basic function is to compute the median gray-level value in the neighborhood in which filter located.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

21

Spatial filtering

Third step is Image Restoration, which is the operation of taking a corrupt/noisy image and estimating the clean, original image. Corruption may come in many forms such as motion blur, noise and camera mis-focus. Image restoration is performed by reversing the process that blurred the image and such is performed by imaging a point source and use the point source image, which is called the Point Spread Function (PSF) to restore the image information lost to the blurring process.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

22

Image Restoration

https://en.wikipedia.org/wiki/Motion_blur

https://en.wikipedia.org/wiki/Image_noise

Model the degradation and applying the inverse process in order to recover the original image. The principal goal of restoration techniques is to improve an image in some predefined sense. Although there are areas of overlap, image enhancement is largely a subjective process, while restoration is for the most part an objective process.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

23

Image Restoration

The problem is how to estimating the degradation function. This problem can tackled by building a mathematical model of the degradation as given figure above. And reproducing the degradation process on a known image. In degradation model for blurring image, the image is blurred using different kinds of filters and an additive noise. The image can be degraded by using salt and pepper noise and Gaussian Noise.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

24

Image Restoration

Restoration is obtained by degrading the image using restoration filters. In this process, noise and blur image factor is removed and we get the estimated original image.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

25

Image Restoration

Fourth step is Morphological Processing, which deals with tools for extracting image components that are useful in the representation and description of shape. Morphological image processing is a collection of non-linear operations related to the shape or morphology of features in an image.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

26

Key states in DIP

Morphological techniques probe an image with a small shape or template called a structuring element. The structuring element is positioned at all possible locations in the image and it is compared with the corresponding neighbourhoodof pixels. Some operations test whether the element "fits" within the neighbourhood, while others test whether it "hits" or intersects the neighbourhood. A morphological operation on a binary image creates a new binary image in which the pixel has a non-zero value only if the test is successful at that location in the input image.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

27

Morphological image processing

When a structuring element is placed in a binary image, each of its pixels is associated with the corresponding pixel of the neighbourhood under the structuring element. The structuring element is said to fit the image if, for each of its pixels set to 1, the corresponding image pixel is also 1. Similarly, a structuring element is said to hit, or intersect, an image if, at least for one of its pixels set to 1 the corresponding image pixel is also 1. Zero-valued pixels of the structuring element are ignored, i.e. indicate points where the corresponding image value is irrelevant.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

28

Morphological image processing

Fifth step is Segmentation, which procedures partition an image into its constituent parts or objects. In this step, we are going to segment the image, separating background from foreground.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

29

Segmentation

Let’s understand image segmentation using a simple example. Consider the above left hand side image. There’s only one object here – a dog. We can build a straightforward cat-dog classifier model and predict that there’s a dog in the given image. But what if we have both a cat and a dog in a single image? We can train a multi-label classifier, in that instance. Now, there’s another caveat – we won’t know the location of either animal/object in the image.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

30

Segmentation

We can divide or partition the image into various parts called segments. It’s not a great idea to process the entire image at the same time as there will be regions in the image which do not contain any information. By dividing the image into segments, we can make use of the important segments for processing the image. That, in a nutshell, is how image segmentation works. An image is a collection or set of different pixels. We group together the pixels that have similar attributes using image segmentation. Take a moment to go through the below visual (it’ll give you a practical idea of image segmentation).

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

31

So how does image segmentation work?

We can broadly divide image segmentation techniques into two types. Consider the above images. Can you identify the difference between these two? Both the images are using image segmentation to identify and locate the people present. In image 1, every pixel belongs to a particular class (either background or person). Also, all the pixels belonging to a particular class are represented by the same color (background as black and person as pink). This is an example of semantic segmentation.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

32

The Different Types of Image Segmentation

Image 2 has also assigned a particular class to each pixel of the image. However, different objects of the same class have different colors (Person 1 as red, Person 2 as green, background as black, etc.). This is an example of instance segmentation. Let me quickly summarize what we’ve learned. If there are 5 people in an image, semantic segmentation will focus on classifying all the people as a single instance. Instance segmentation, on the other hand. will identify each of these people individually.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

33

The Different Types of Image Segmentation

Sixth step is Representation and Description, which almost always follow the output of a segmentation stage, which usually is raw pixel data, constituting either the boundary of a region (i.e., the set of pixels separating one image regionfrom another) or all the points in the region itself.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

34

Representation and Description

The results of segmentation is a set of regions. Regions have then to be represented and described. There are two main ways of representing a region: external characteristics (its boundary)- focus on shape and internal characteristics (its internal pixels): focus on color, textures…The next step is description. E.g.: a region may be represented by its boundary, and its boundary described by some features such as length, regularity… Features should be insensitive to translation, rotation, and scaling. Both boundary and regional descriptors are often used together.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

35


In order to represent a boundary, it is useful to compact the raw data (list of boundary pixels). Chain codes is a list of segments with defined length and direction. First is 4-directional chain codes and second is 8-directional chain codes.

It may be useful to downsample the data before computing the chain code to reduce the code dimension and to remove small detail along the boundary.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

36


Seventh step is Recognition, which is the process that assigns a label (e.g., “vehicle”) to an object based on its descriptors.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

37

Key states in DIP

Object recognition is a general term to describe a collection of related computer vision tasks that involve identifying objects in digital photographs. Image classification involves predicting the class of one object in an image. Object localization refers to identifying the location of one or more objects in an image and drawing abounding box around their extent. Object detection combines these two tasks and localizes and classifies one or more objects in an image.When a user or practitioner refers to “object recognition“, they often mean “object detection“.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

38

Object recognition

As such, we can distinguish between these three computer vision tasks: Image Classification ( Predict the type or class of an object in an image), Object Localization (Locate the presence of objects in an image and indicate their location with a bounding box) and Object Detection (Locate the presence of objects with a bounding box and types or classes of the located objects in an image).

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

39

Object recognition

The R-CNN family of methods refers to the R-CNN, which may stand for “Regions with CNN Features” or “Region-Based Convolutional Neural Network,” developed by Ross Girshick, et al. This includes the techniques R-CNN, Fast R-CNN, and Faster-RCNN designed and demonstrated for object localization and object recognition. Let’s take a closer look at the highlights of each of these techniques in turn.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

40

Object recognition

http://www.rossgirshick.info/

The R-CNN was described in the 2014 paper by Ross Girshick, et al. from UC Berkeley titled “Rich feature hierarchies for accurate object detection and semantic segmentation.”It may have been one of the first large and successful application of convolutional neural networks to the problem of object localization, detection, and segmentation. The approach was demonstrated on benchmark datasets, achieving then state-of-the-art results on the VOC-2012 dataset and the 200-class ILSVRC-2013 object detection dataset.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

41

Object recognition

https://arxiv.org/abs/1311.2524

Given the great success of R-CNN, Ross Girshick, then at Microsoft Research, proposed an extension to address the speed issues of R-CNN in a 2015 paper titled “Fast R-CNN.” Fast R-CNN is proposed as a single model instead of a pipeline to learn and output regions and classifications directly. The architecture of the model takes the photograph a set of region proposals as input that are passed through a deep convolutional neural network. A pre-trained CNN, such as a VGG-16, is used for feature extraction. The end of the deep CNN is a custom layer called a Region of Interest Pooling Layer, or RoI Pooling, that extracts features specific for a given input candidate region.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

42

Object recognition


The output of the CNN is then interpreted by a fully connected layer then the model bifurcates into two outputs, one for the class prediction via a softmax layer, and another with a linear output for the bounding box. This process is then repeated multiple times for each region of interest in a given image..

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

43

Object recognition

Faster R-CNN: The model architecture was further improved for both speed of training and detection by Shaoqing Ren, et al. at Microsoft Research in the 2016 paper titled “Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks.” The architecture was designed to both propose and refine region proposals as part of the training process, referred to as a Region Proposal Network, or RPN. These regions are then used in concert with a Fast R-CNN model in a single model design. These improvements both reduce the number of region proposals and accelerate the test-time operation of the model to near real-time with then state-of-the-art performance.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

44

Object recognition


Image Compression, as the name implies, which deals with techniques for reducing the storage required to save an image, or the bandwidth required to transmit it. Although storage technology has improved significantly over the past decade, the same cannot be said for transmission capacity.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

45

Key states in DIP

Color Image Processing is an area that has been gaining in importance because of the significant increase in the use of digital images over the Internet.

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

46

Key states in DIP

Ad

va

nced

AI | C

hu

ng

-An

g U

niv

ersity | P

resenter: N

gu

yen

Th

e Vi

47

Questions and answers ???!!!

Documents

Digital Image Processing - CAUmi.cau.ac.kr/teaching/lecture_aai/DIP.pdfImage processing is a method to convert an image into digital form and perform some operations on it, in order