Robot Computer Visionmohiqbal.staff.gunadarma.ac.id/Downloads/files/82992/...Machine Vision : Akuisisi data citra, diikuti pemrosesan dan interpretasi data untuk aplikasi industri

The City College of New YorkPengantar Robotika – Universitas Gunadarma 1

Robot Computer VisionDr. Mohammad Iqbal

Based-on slide Dr. John (Jizhong) Xiao

The City College of New YorkPengantar Robotika – Universitas Gunadarma

Introduction

• What is computer vision?– Cameras can be thought of as an array of individual sensors– Computer vision attempts to use known constraints (whether

from physics or known environmental structure of our world), to extract relevant information from the sensor values and this known world structure

• We will only deal with geometric issues of computer vision– i.e. not optics or illumination issues

• We will discuss two primary functions:1. Grouping/segmenting objects within an image2. Extracting position and orientation information from the image


Machine Vision : Akuisisidata citra, diikutipemrosesan daninterpretasi data untukaplikasi industri sepertiinspeksi, pengukurandsb.• Pemosisi obyek yang akurat• Menjaga posisi relatif • Pengukuran obyek• Pengenalan obyek• Object Recognition• Registrasi obyek

Visual Servoing

Machine Vision dan robot


• Inspeksi

• Identifikasi

• Robot guidance

Jenis Machine Vision


• 2D system• Mengukur dimensi• Verifikasi keberadaan suatu komponen.• Verifikasi fitur dan warna• Cek hasil pencetakan dan kode• Deteksi cacat

• 3D system–Rekonstruksi dan inspeksi dari bentuk

kompleks

Jenis Machine Vision


• Akuisisi citra• Pemrosesan dan analisis citra• Interpretasi citra

Steps of Machine Vision


COGNEX• 33th year• Public company

(CGNX on NASDAQ)

• 750,000+ systems installed

• Worldwide leader in machine vision

Contoh Machine Vision System


Behind Cognex ID tools • Teknologi terkini:

– Object Location– Feature detection– Object recognition– Image processing

• Intellectual Property– 193 patent

PatMax Object Location

Cognex ID Technology



COGNEXProducts



• Single Perspective Camera• Multiple Perspective Cameras (Stereo Camera Pair)• Laser Scanner• Omnidirectional Camera• Structured Light Sensor

Sensor Vision


SinglePerspectiveCamera

Sensor Vision


MultiplePerspectiveCamera(StereoCamera)

Sensor Vision


Aqsense

LaserScanner

Sensor Vision


Omnidirectional Camera

Sensor Vision


StructuredLight

Sensor Vision


Mekanisme Machine Vision

• Vision Guided robot memiliki sistem closed control loop.

Visual Servoing


• Konfigurasi kamera

End-Effector MountedFixed

Mekanisme Machine VisionVisual Servoing


Contoh : ABB Integrated Vision



Motosight

MotomanMotosight



1. PersiapanImagingsystem:kamera(board/webcam),lensa,scene.

2. PersiapanpemrogramanVision:bahasa,frameworkvisionopensource(opencv,simplecv,...)

3. IntegrasiController4. Integrasipemrog.Visiondengan

pemrog.Mekatronik.

Mendalami Machine Vision


MemilihController1. Mikrokontroler:

• Arduino• Jenis2tertentu

2. MiniPC• Beaglebug• raspberryPi

3. PCdekstop/laptop4. Smartphone/tablet:Android

Mendalami Machine Vision


Frame Koordinat Kamera• The camera consists of a lens (focal length λ) and an image

plane (where the pixel array is physically located)• The image plane is an array of pixels of dimension Nrows x

Ncolumns

– (u,v) are used to parameterize the image plane• First define the camera coordinates:

– Definition: the center of projection is the origin of the camera frame, located λ behind the image plane

• The x and y components of the origin of the camera frame are parallel to the image plane

– Definition: the optical axis is the line that is collinear with the z coordinate of the origin of the camera frame

– Definition: the principal point is the intersection point of the image plane and the optical axis


Frame Koordinat Kamera• Thus, any point on the image plane will have coordinates (u,v,λ)• Perspective projection:

– Let P be a point in the world with coordinates (x,y,z)– Let p be the projection of P onto the image frame with coordinates

(u,v,λ)– The points P, p, and the origin of the camera frame are collinear,

thus:

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

=

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

λ

vu

zyx

k

zyv

zxu λλ == ,


Segmentasi

• Dividing an image into separate components

• Numerous ways to do this, one is to choose thresholds– If a pixel value is above a threshold, it is in

one group, and if it is below the threshold it is in another


Definisi1. Histogram: for an 8-bit grayscale image (pixel values from 0-

255), the histogram H(z) is a count of the number of occurrences of the value z• Note that:• And:

2. Probability that a pixel will have value z:

3. Mean value of the grayscale image:

4. Variance of a grayscale image:

( ) ( )( )

( )columnsrowszNNzH

zHzHzP

×==

∑ =

255

0

( )∑=

=255

0zzzPµ

( ) ( )∑=

−=255

0

22

zzPz µσ

( ) columnsrows NNzH ×≤≤0

( ) columnsrowsz

NNzH ×=∑=

255

0


Definisi• Suppose that the image is composed of a number of objects• Instead of computing the mean for the whole image, we can

compute the mean for each object– First, construct individual histograms for each object and for the

background– Notation: Hi is the histogram for the ith object, i = 0 is the

background– Then the mean for the ith object is:

– Called the conditional mean– The conditional variance is:

( ) ( )( )∑

∑∑

==

=

==255

0255

0

255

0 zz i

i

zii

zHzHzzzPµ

( ) ( ) ( ) ( )( )∑

∑∑

==

=

−=−=255

0255

0

2255

0

22

zz i

iii

zii

zHzHzzPz µµσ


Pemilihan Threshold• Call zt the threshold• We will first divide the image into two groups based upon the

threshold:– If a pixel value z > zt, that pixel belongs to group 1, otherwise group

0• First we rewrite the conditional means and variances for each

group– Definition: qi(zt) is the probability that a pixel will belong to a group i

given the threshold zt

– Of course, q0(zt) + q1(zt) = 1

( )( )

( )( )

255

10

0columnsrows

zzt

columnsrows

z

zt NN

zHzq

NNzH

zq t

t

×=

×=

∑∑ ==


Pemilihan Threshold• Now lets rewrite the conditional means:

• Using the separation of the two groups by the threshold, we can write:

• Combining with the above conditional means:

( )( )

( ) ( )( ) ( )∑

∑∑∑ =

===

×

×== 255

0

255

0

255

0255

0/

/

z columnsrowsi

columnsrowsi

zzz i

ii

NNzHNNzHz

zHzHzµ

( ) ( ) ( )⎩⎨⎧ ≤×

=else00

tcolumnsrows zzzPNNzH ( )

( ) ( )⎩⎨⎧

×

≤=

else0

1 zPNNzz

zHcolumnsrows

t

( ) ( ) ( )( ) ( )

( )( )t

z

zz

z columnsrows

columnsrowsz

zt zq

zPzNNzH

NNzHzzt

t

t

000 0

0

00

//

∑∑

∑=

==

=×

×=µ

( ) ( ) ( )( ) ( )

( )( )tzz

zz columnsrows

columnsrows

zzt zq

zPzNNzH

NNzHzzt

tt 1

255

1255

1 1

1255

11

//

∑∑

∑+=

+=+=

=×

×=µ


Pemilihan Threshold• Similarly, the conditional variances are:

• Ok, now we have the histograms, conditional means, and conditional variances for each group– but these are based upon some threshold zt… how do we

determine zt?• Intuitively, if we have a good choice for zt, the variances will be

small– i.e. the values of the pixels in a given group will be close to the

group mean– Thus a good choice for zt is a value that minimizes all group

variances

( ) ( )( ) ( )( )tzz

tt zqzPzzz

t 1

255

1

21

21 ∑

+=

−= µσ

( ) ( )( ) ( )( )t

z

ztt zq

zPzzzt

00

20

20 ∑

=

−= µσ


Pemilihan Threshold• Definition: the within-group variance is defined as:

– This is a weighted-average of the variances of the two groups

• Weighted with the probability of a pixel being in that group– Choose zt to minimize σw

2(zt) – To do this, iterate over all values of zt and choose the value

that minimizes the within group variance– Note that this can be computationally expensive and there

are alternative approaches that are faster

( ) ( ) ( ) ( ) ( )tttttw zzqzzqz 211

200

2 σσσ +=


Connected components• Now we have separated the image into two groups: object(s) and

background• It is possible that there are multiple objects in the image… how do

we identify individual objects?• First, we define connected pixels:

– Consider a pixel with coordinates (r,c)– This pixel will have nearest neighbors (r-1,c), (r+1,c), (r,c-1), (r,c+1)– Two pixels are 4-connected if, for a given pixel, another pixel is one of

its four nearest neighbors– If you include the diagonal pixels, we can define 8-connected in the

same way• A connected component is a set of pixels such that for any two

pixels in the set, there is a connected path between them


Connected components• We want to assign a unique identifier to each set of

connected components – To identify individual objects in the image perform the following

algorithm:1. Raster-scan the image from left to right and top to bottom

• For a given pixel with coordinates (r,c), if the pixel immediately above and immediately to the left are background, then this is a new object and assign a new label

• If either of the pixels directly above or directly to the left have received an assignment, take the minimum of these two as the image assignment for pixel (r,c)

• If both pixels directly above and directly below have been assigned a value, note an equivalence

2. Raster scan again, this time noting the equivalence• Replace each label with the minimum of the label’s equivalence


Connected components• (a) initial thresholded image, (b) assignment after first raster-

scan, (c) assignment after second raster-scan, (d) final component assignment


Indicator function• Once the image is separated into individual objects, it is

very useful to construct an indicator function as a mapping to describe if a pixel belongs to a particular object

– Thus there are Nobjects indicator functions that are each the size of the original image

( ) ( )⎩⎨⎧

=else0

component in is , pixel1,

icrcriI


Position and orientation• Finally, we get to the point that we can extract useful information

from the world• We use the indicator function to extract information about each

object– Since we have a planar image, all we can get is position and orientation

• Intermediate step: Moments• Definition: the moment of the kth object (group) in the image plane is

defined as:

• Note: m00 is the number of pixels in a given object

( ) ( )∑ ∑= =

=rows columnsN

r

N

ck

jiij crcrkm

1 1,I


Position and orientation

• The order of the moment is defined as i + j• First order moments are very useful in computing the

centroid of an object:

( ) ( )∑ ∑= =

=rows columnsN

r

N

ck crrkm

1 110 ,I

( ) ( )∑ ∑= =

=rows columnsN

r

N

ck crckm

1 101 ,I


Position and orientation• Now we can use the moments of a group to directly get

information about the object it represents


centroids• Define the centroid as the point at which, if all mass were

at this point, the first moment would not change– Notation:– From this definition, we can say:

( ) ( )∑ ∑∑ ∑= == =

=rows columnsrows columns N

r

N

ck

N

r

N

ckk crrcrr

1 11 1,, II

( )cr,

( ) ( )∑ ∑∑ ∑= == =

=rows columnsrows columns N

r

N

ck

N

r

N

ckk crccrc

1 11 1,, II

( ) ( )kmkmrk 1000 =

( ) ( )kmkmck 0100 =

( )( )kmkmrk

00

10=

( )( )kmkmck

00

01=


Central moments• Calculate the moment with respect to the object’s center

of mass• Called the central moments

• Thus the moments are now invariant to translation of the object

( ) ( ) ( ) ( )∑ ∑= =

−−=rows columnsN

r

N

ck

jk

ikij crccrrkC

1 1,I


Object orientation• Define the line that minimizes the second moment as the

orientation of the object– This second moment is:

– Where d(r,c) is the distance from pixel (r,c) to the line• Now we require two parameters to define a line, ρ θ

• So (cosθ, sinθ) is the unit normal to the line and ρ is the minimum distance from the line to the origin

• Using this parameterization, the distance the line to a point (r,c) is:

( ) ( )∑ ∑= =

=rows columnsN

r

N

ck crcrdL

1 1

2 ,, I

0sincos =−+ ρθθ yx

( ) ρθθ −+= sincos, crcrd


Object orientation• Now suppose that L* is the minimum value of L given as:

• To find this minimum, we can take partial derivatives with respect to ρ and θ and set them to zero– First with respect to ρ:

( ) ( )∑ ∑= =

−+=rows columnsN

r

N

ccrcrL

1 1

2

,,sincosmin* Iρθθ

θρ

( ) ( )

( ) ( )

( ) ( ) ( )

( )ρθθ

ρθρθρ

ρθρθρθθθθρ

ρθθρρ

−+−=

+−−=

+−−++∂

∂=

−+∂

∂=

∂

∂

∑ ∑∑ ∑∑ ∑

∑ ∑

∑ ∑

= == == =

= =

= =

sincos2

,2,sin2,cos2

,cos2cos2sinsincos2cos

,sincos

00

1 11 11 1

1 1

22222

1 1

2

crm

crcrccrr

crcrcrcr

crcrL

rows columnsrows columnsrows columns

rows columns

rows columns

N

r

N

c

N

r

N

c

N

r

N

c

N

r

N

c

N

r

N

c

III

I

I


Object orientation

• Setting this to zero gives the following:

– Since we are assuming that an object will have at least one pixel– But this is just saying that the line must pass through the centroid– We use this to simplify the remaining equations

• Define new coordinates:

– Since the line that minimizes L passes through the centroid, it also passes through the line r’ = 0, c’ = 0

– So we can write:

0sincos =−+ ρθθ cr

cccrrr −=ʹ−=ʹ ,

( ) ( ) 0sincossincos =−+−=ʹ+ʹ θθθθ ccrrcr


Object orientation• Now take the partial derivative with respect to θ:

– First some simplifications

( ) ( )

( ) ( ) ( ) ( ) ( ) ( )∑ ∑∑ ∑∑ ∑

∑ ∑

= == == =

= =

ʹ+ʹʹ+ʹ=

ʹ+ʹ=

rows columnsrows columnsrows columns

rows columns

N

r

N

c

N

r

N

c

N

r

N

c

N

r

N

c

crccrcrcrr

crcrL

1 1

22

1 11 1

22

1 1

2

,sin,sincos2,cos

,sincos

III

I

θθθθ

θθ

( ) ( )∑ ∑= =

−rows columnsN

r

N

ccrrr

1 1

2 ,I ( )( ) ( )∑ ∑= =

−−rows columnsN

r

N

ccrccrr

1 1,I ( ) ( )∑ ∑

= =

−rows columnsN

r

N

ccrcc

1 1

2 ,I

θθθθ 20211

220 sinsincos2cos CCCL ++=


Object orientation• Further simplification:

• Take the partial derivative with respect to θ:

( ) ( ) θθ

θθθ

θθθθ

2sin2cos21

21

2cos21

212sin2cos

21

21

sinsincos2cos

1102200220

021120

20211

220

CCCCC

CCC

CCCL

+−++=

⎟⎠

⎞⎜⎝

⎛−++⎟

⎠

⎞⎜⎝

⎛+=

++=

( ) ( )

( ) θθ

θθθθ

2cos22sin

2sin2cos21

21

110220

1102200220

CCC

CCCCCL

+−−=

⎟⎠

⎞⎜⎝

⎛+−++

∂

∂=

∂

∂


Object orientation

• Set this to zero:

• Finally, the orientation is defined by the following:

0220

1122tanCC

C−

=θ

( ) θθ 2cos22sin0 110220 CCC +−−=

⎟⎟⎠

⎞⎜⎜⎝

⎛

−= −

0220

111 2tan21

CCC

θ


Camera calibration

• Now that we know the position and orientation of objects in the camera coordinate frame, we want to use these in a robotic manipulation task– i.e. we want to convert from camera coordinate to world

coordinates– We said that we do not have enough information to extract the

world coordinates from a 2D image, but if we know the camera’s position and orientation relative to the world frame, we can write:

wc

cwc

w oxRx +=


Other aspects of vision• Feature detection: a way of abstracting images from • Example: edge detection

– Detect abrupt changes in an image– i.e. use (estimated) image derivatives (with finite

differences)• Qualitatively equivalent to spatial high pass filtering

– Estimate derivatives using finite differences:

• Where hi,j are the pixel values at position (i,j)• This is equivalent to a convolution, where the kernel is:• This will have a strong positive response to a vertical edge

that is positive on one side and negative on the other• But this will also give a strong response in the presence of

noise!• How to alleviate that?... First smooth the image by adding

Gaussian noise

jiji hhxh

,1,1 −+ −≈∂

∂

⎥⎥⎥

⎦

⎤

⎢⎢⎢

⎣

⎡

−

000101000


Other aspects of vision• The Hough transform

– Once we have an image that has its features extracted, we may want to know information about the features

• In the case that the features are edges, maybe we want the positions and angles of the edges

– Hough transform, simplest form: represent a line by a parameterization (for example ρ and θ as we already have done), then transform each pixel in the image to the (ρ,θ) space

• i.e. any line can be represented as a point in (ρ,θ) – There can be an infinite number of lines through any point– For example, all the lines that go through a point (x0,y0) must obey

the following:

• This gives sinusoids in the Hough space for every point in the image • The points where two sinusoids intersect represents two points that can

be in the same line

( ) θθθρ sincos 00 yx +=


Other aspects of vision

• The Hough transform– Thus the Hough transform breaks down into finding the points of

highest intersection in the Hough space – But how to do that… it may be unclear how many to select


Other aspects of vision• Optical flow

– A measure of the movement of features in a visual field

– e.g. elementary motion detector– Creates a vector field that locally describes

motion


Thank you!

Documents

Robot Computer Visionmohiqbal.staff.gunadarma.ac.id/Downloads/files/82992/...Machine Vision : Akuisisi data citra, diikuti pemrosesan dan interpretasi data untuk aplikasi industri