A Survey on Applications of Thermal Imaging with special

A Survey on Infrared

Imaging with focus on Human

Tracking.

Nikhil Arun Naik

ECE 671

SPRING 2004

Table of Contents

Abstract………………………………………………………….i

1.1. Introduction……………………………………………… 01 1.1.1 Motivation.……………………………………………………….. 03

1.1.2 Mission….………………………………………………………... 04

1.2. Infrared imaging…………………………………………. 05 1.2.1 Overview...………………………………………………………... 06

1.2.2 General Applications……………………………………………... 08

1.3. Infrared imaging based tracking………………………... 14 1.3.1 Object tracking.....………..……………………………….……… 14

1.3.2 Face region tracking……………………………………………... 17

1.4. Human tracking using infrared imaging.......................... 19 1.4.1 Background on human tracking…..……………………………… 19

1.4.2 Survey on human tracking techniques……………………….…… 22

1.5 Calibration of infrared cameras…………………………. 40 1.5.1 Survey on Black body calibrators for IR cameras ….……………. 41

1.6. Conclusions.…..................................................................... 47 1.6.1 Summary........................................................................................... 47

1.6.2 Future work…………….……...……………………….…………. 47

2

References…................................................................................ 48 Papers

Bibliography

Websites

3

Abstract

Our project is A Survey on Infrared Imaging with main focus being on its use in Human

Tracking systems. We will present a brief review on all the major applications of infrared

imaging after looking into all the possible applications of infrared imaging and doing an

in depth survey on its use in human tracking systems. The other applications of infrared

imaging that would be touched upon would include motion detection, face recognition,

pattern recognition, intrusion detection, surveillance and others but we shall be very brief

about their details.

The survey would mainly focus on general object tracking and human tracking

techniques using infrared imaging. The human tracking systems other than performing

the task of continuously monitoring the movements of a person would allow us to locate

people, detect humans and all other objects in a given field of view. Humans would be

basically tracked for by looking out for features like their face, head, bust and silhouette.

Looking out for changes in temperature over a certain region of area over a certain period

of time would help. We would also be looking at some of the calibration techniques for

infrared cameras and present a commercial survey on different available black body

calibrators for infrared cameras.

i

1.1. Introduction

As is clear by now from the title and the abstract this project deals with “A Survey on

Infrared Imaging with main focus being on its use in Human Tracking systems”. Infrared

imaging is a sub system of the vast field of image processing which is fast developing

with an increased scope in the coming years due to excessive focus being laid on security

systems. What is Infrared? [2] Infrared is a band of energy in the 2mm to 100mm

wavelength range in the electromagnetic spectrum [2]. The visible spectrum lies only in

the range of wavelengths from 0.4mm to 0.7mm the band of energy above this in the

electromagnetic spectrum is the infrared spectrum and the band of energy below the

visible spectrum is the ultraviolet spectrum [2]. Infrared light behaves very much similar

to the visible light [2]. Infrared light travels at the speed of light (2.988 X 108 m/s) and

just as visible light it to can be reflected, refracted, absorbed and emitted [2].

An infrared image is a pattern generated proportional to a temperature function

corresponding to the area or the object that is being imaged [2]. An infrared image is

obtained based on the principle that vibration and rotation of atoms and molecules in an

object causes the object to give out heat which is captured by an infrared sensor to give

us an image [2]. The Stefan – Boltzmann’s law tells us that infrared power of an object is

directly proportional to the 4th power of the absolute temperature of the object; hence we

can infer from this that the output power of the object would tend to increase very fast

with increase in absolute temperature of the object [2].

The infrared imaging spectrum can be broadly divided into two ranges of wavelengths

[2], the mid wavelength band infrared (MWIR) has an energy spectrum in the 3mm to

5mm range and the long wavelength band infrared (LWIR) has an energy spectrum in the

range 8mm to 14mm [2]. The selection of the infrared band depends on the type of

performance that is desired for the specific application that it is being used for [2].

1

Figure 1: shows a diagram of the electromagnetic spectrum of energy. The diagram shows the visible

spectrum surrounded by the infrared and the ultraviolet spectrum on either side. The picture has been

obtained from TEAMWORKnet Inc. official website. It was part of a paper presented by Harry Tittel, Vice

President TEAMWORKnet Inc.

In the figure above we can see a high interference zone between the two infrared energy

band spectrums [2]. It has been observed that MWIR is better suited for hotter objects or

in cases where sensitivity is of less importance in relation to contrast [2]. Also MWIR has

an advantage that it requires smaller optics [2]. Traditionally LWIR is preferred in cases

where we require high performance infrared imaging since it has a higher sensitivity to

ambient temperature of objects and also displays better transmission through smoke, mist

and fog [2]. MWIR and LWIR have major differences with regards to background flux,

temperature contrasts and atmospheric transmission.

2

Mid wavelength band infrared (MWIR) Long wavelength band infrared (LWIR)

• It has a higher resolution due to a

smaller optical diffraction [2].

• Higher contrast [2].

• Good only in clear weather

conditions [2].

• Transmission is possible in high

humidity conditions [2].

• It shows a good performance in

foggy, hazy and in misty conditions

[2].

• Its transmission is least affected by

atmospheric conditions [2].

• It reduces solar glint and fire glare

sensitivity [2].

Table 1: shows a brief description of the advantages and applications of MWIR and LWIR.

1.1.1. Motivation

In our fall 2003 ECE 573 project Infrared Imaging in Modular Multipurpose Multi –

sensor Robot and in our spring 2004 ECE 574 project Infrared Imaging Sensor Brick for

the MODSEN Robot we have been working towards building an infrared sensor brick

with Omega infrared camera on it for the purpose of data capture. Since this system could

be set up like a small sized self sufficient device which could perform search and

surveillance operations on its own so we decided to explore the possibility of setting up a

human tracking system on it. This was the primary motivating factor for this project. We

felt the need to conduct a complete literature review on all the possible applications of

infrared imaging with special focus on human tracking.

The system anticipated to be developed after integrating the two projects involving the

building of the infrared sensor brick and then setting up a human tracking system on it

promises to be of great use in search and surveillance operations. The inherent

advantages of such a system are that it is small in size and light in weight and it can

perform the highly sophisticated task of human tracking. Being able to capture infrared

data and track human beings is what makes it very special because in the dark where

3

human vision stops the infrared imagery would be used to detect possible ambushes, plots

and hidden enemies by making use of the night vision capabilities of infrared imagery by

sensing heat. In the past people have been successful in setting up robotic systems similar

to ours using a vision camera for the purpose of human tracking. Other possible further

applications on the brick could also be face recognition, pattern recognition. Since

currently the world is seeing an unprecedented increase in the level of concern for both

safety and security issues the demand for such human tracking systems should have a

bright future.

1.1.2. Mission

In this project of ours we are conducting a survey on all the possible applications of

infrared imaging with main focus set on the use of infrared imaging in human tracking

systems. A brief review of all the major applications of infrared imaging would be

provided. We shall also conduct an in depth survey on the use of infrared imaging in

human tracking systems. Applications of infrared imaging like motion and intrusion

detection; face and pattern recognition; surveillance and others would be touched upon

but we shall be very brief about their details. All these other applications of infrared

imaging could feature as possible future work for our integrated system so the task of

literature review is being completed here.

The survey would lay its main focus on general object and human tracking techniques

and systems using infrared imaging. The human tracking systems other than performing

the task of continuously monitoring the movements of a person would allow us to locate

people, follow them and perhaps even help in attracting the attention of concerned

authorities by sounding an alarm if people cross certain boundaries in a given field of

view. Humans would be basically tracked for by looking out for features like their face,

head, bust and silhouette. We could also look out for changes in temperature over certain

region of area over a certain period of time.

4

1.2. Infrared Imaging

As discussed in the introduction above an infrared image is one, which gives us a pattern

corresponding to the heat energy radiated by the area or the object that is being imaged.

Infrared images are usually used to see those things, which are not naturally visible to the

human eyes (i.e. things, which are invisible to normal vision). For instance a thermal

image can be used to procure night vision capabilities where by we can detect objects in

the dark which would remain invisible in a normal visual image. Night vision capabilities

have useful applications in search and security operations where we can sense and detect

hidden objects or people who might have set an ambush or a plot in the dark. These

objects would not be visible to the human eye in the absence of light.

A thermal image could also be used to detect visibly opaque items, (i.e.) objects, which

are usually hidden behind other objects or things. This task is accomplished by a thermal

image which gives us a pattern showing the heat energy emitted in an area or by an object

and incase there is any temperature difference in that area or in that object then it will

show up very clearly in the image. Thermal images could also be used for face

recognition and pattern recognition. In face recognition techniques they could be used as

identification markers, which would help in allowing entry and exit to people in secure

locations. Another major application of thermal imagery is in quality control systems

used in the manufacturing processes of many products ranging from food, glass, cast iron

patterns, moulds and others. Here quality assurance and surveillance on the production

line could be ensured using thermal imagery. Lets us now get a brief overview on the

field of infrared imaging and then take a look at the general applications of infrared

imaging.

5

1.2.1 Overview

Thermal imaging as it stands today was developed in the early 1800’s by Sir John

Herchel [2]. He was actually interested in photography and started recording energy in

the infrared spectrum by conducting a number of experiments using carbon and alcohol

[2]. But it has been believed that analysis of humans based on the difference in

temperatures in different regions has been long established. The ancient Greeks and

before them the Egyptians, knew a method of diagnosing diseases by locating the

excessively hot and cold regions in the body [2].

According to the paper in [2] infrared imaging finds applications in 3 major areas. They

are for Monitoring purposes, for Research and Development purposes and General

Industrial applications [2]. Firstly monitoring is used for security purposes like

monitoring a prison area to maintain a check on the prisoners at night in the dark, to

guard a secure location from intruders, in coastal surveillance to look out for enemy

vessels during the night in the dark [2]. Also monitoring could be used to look out for

possible illegal activities or wrong doings and can be used to protect people from the

threat of possible predators in the dark [2].

Secondly monitoring is used to control traffic may it be vehicular traffic or air traffic.

Vehicular traffic can be monitored at night using infrared imagery. Air traffic can be

monitored in the dark for takeoff and landing in the absence of light [2]. Thirdly

monitoring is used in disaster management and in rescue operations where infrared

images can help us in locating victims, spotting fire and in monitoring rescue and other

operations through smoke and fog [2]. Lastly monitoring also finds application in

obtaining disaster footages for television and media persons [2].

The applications of infrared imaging for research and development purposes are very

varied and include areas like remote sensing, ecology and medical treatment, thermal

sensing and thermal design [2]. In remote sensing infrared is used to conduct aerial

observation of ocean and land surfaces on earth, observation of volcanic activity,

6

prospecting for mining and other natural reserves [2] and other exploratory operations

which may even include military and civilian spying. The application of infrared in

ecology and medical treatment is in detecting diseases in plants and animal [2]. They can

also help in early detection of diseases like breast cancer and conducting a check up of

the eye [2]. They are also used in diagnosing heart diseases and in checking for the

functioning of blood streams and the circulatory system [2]. Infrared also finds

application in veterinary medicine for diagnosing animals like cats, dogs, horses, etc. [2].

In thermal analysis the applications are in design of heat engines [2], analyzing the

thermal insulating capacity or the heat conducting capacity of a body or a substance [2].

In thermal design the applications are in analyzing the heat emission levels and rates [2],

it can also help in evaluating transient thermal phenomenon in electronic components [2]

and on a bigger scale in evaluating power plants [2].

The general industrial applications of infrared are in facility monitoring, non destructive

inspections and in managing manufacturing processes [2]. In monitoring facilities like a

chemical plant or a nuclear reactor the infrared renders help in looking for abnormal heat

emission levels in boilers or distillers [2]. Detection and tracking of gas leakages and

faulty piping can also be done [2]. Power transmission line and transformers are also

monitored [2]. Infrared finds application in non-destructive testing in cases like

inspecting buildings, roofs, walls and floor surfaces [2]. In inspecting cold storage

facilities and in inspecting internal defects in walls and other optically opaque objects [2].

The major industrial application of non-destructive testing is in checking motors,

bearings and rings [2]. Managing manufacturing processes accurately is a difficult task

and they pose a strong challenge but the advantages of infrared can be aptly utilized like

in the case of maintaining the distribution of heat in a furnace, or in the case of metal

rolling processes [2]. Infrared is also used to manage smoke and thermal exhausts and to

control the temperature of metallic molding processes [2].

7

1.2.2 General Applications

In the sections above we have just seen a brief overview of the origin, uses and

applications of infrared imaging. In this section we shall lay slightly more emphasis on

all the known applications of infrared imaging and cover them all in brief before we

begin to focus on the main application of our interest (i.e.) Human Tracking.

The major application of infrared imaging is in giving us night vision capability (i.e.) to

see people and objects in the dark in the absence of light by creating a pattern in response

to the heat radiated by objects in that area. The thermal images help in sensing and

thereby detecting hidden objects or people, which are not naturally visible to the human

eye in the dark [8]. This finds major use in both military and civilian search and

surveillance applications. Face recognition and Pattern recognition are amongst the more

classical applications of infrared imaging [8]. This is because thermal images are both

pose and illumination invariant and so the prevalent lighting and other conditions that

affect visual images do not affect infrared images. Face recognition could be used for

hunting down wanted suspects and the application of pattern recognition could

encompass area surveillance and human detection to keep a check on unforeseen

activities in a secure area. Infrared imaging has many industrial applications like quality

control in the manufacturing process of many products ranging from food, glass, cast iron

patterns, moulds and other products where quality assurance and surveillance on the

production line is necessary [8].

In the case of a fire related disaster infrared imaging is used to detect leaks, fire, see

through smoke and search for victims, detect for the presence of other flammable

substances in that area and provide vision in these conditions for all the further rescue

operations [8]. Like in the case of military operations infrared imaging also finds some

applications in naval operations where it is used for detecting possible oil spillage and

also threat posed from enemy vessels at night in the dark [8]. In the air force they use it

for aerial surveillance and enemy detection [8]. Some of the commercial applications

8

include coverage of disaster footages through smoke and dark areas for use by the media

and television networks [8].

The growth of the infrared technology has been a big boon for the maintenance industry

since infrared videos help in seeing through visually opaque objects thereby helping in

providing vital details for the purpose of predictive maintenance. Over the years the

maintenance industry has changed in its approach. Traditionally maintenance was seen

necessary only in case of breakdowns, post World War II it became maintenance for the

purpose of preventing a breakdown and nowadays it is predictive maintenance [2].

Predictive maintenance takes place in thermal imaging surveys, in electromagnetic

testing, breaker relay testing, visual testing and leak testing [2]. Other areas include

magnetic particle inspection, ultrasonic inspection, in vibration analysis, in eddy current

analysis, also in transformer oil analysis and in X- ray and gamma ray radiography [2].

The ability of infrared images to help detect hot spots and temperature differences

without making any physical contact is what makes them special in the process of

predictive maintenance [2].

All the possible major applications of infrared imaging as given in the paper in [2] are

mentioned below. Infrared imaging is used to inspect heater tubes, in steam / air de-

coking, to confirm the readings of a thermocouple, in flame impingement, in refractory

breakdown, in condenser fins and to verify spheroid levels [2]. Infrared imaging is also

used to check air-leakage in a furnace, locate gas emission; they also find application in

the aerospace industry [2]. In electrical systems thermal images are used to inspect power

generators and power substations, to evaluate transformers and capacitors [2]. To inspect

both rural and urban overhead distribution electrical lines, to inspect electric motors and

check motor control centers, starters, breakers, fuses, cables and wires [2].

The automotive applications of thermal imaging are for detecting faulty fuel injection

nozzles, to test the brakes and other engine systems and to evaluate them for performance

and cooling efficiencies [2]. Lastly it also finds application in diagnostics for motor

9

racing suspension and tire contacts [2]. In electronic equipments it is used in the process

of evaluating and troubleshooting printed circuit boards, in thermal mapping of

semiconductor device services and in the evaluation procedure for circuit board

components [2]. It is also used to inspect hybrid microcircuits and solder joints, and in

the inspection of bonded structures [2].

Like in the case of other fields thermal imaging also has applications in mechanical

systems. Here it is used to inspect boilers and kilns, to check building diagnostics and

heat loss, and to inspect roofing systems [2]. They are also used to inspect burners for

flame impingement and burner management, to analyze the fuel combustion patterns, and

to detect thermal patterns on boiler tubes and measure the tube skin temperature during

the normal or standby operation [2]. It is used to scan and record temperatures in the

unmonitored areas of the boiler, to scan the exterior of the boiler for refectory damage or

to locate warmer areas [2]. Infrared images also help in detecting coke buildup in crude

furnaces, flue gas leaks in power plant boilers [2]. They are used to inspect mechanical

bearings, evaluate heating, ventilation and air conditioning equipment, evaluate cold

storage for cooling losses and lastly check refrigeration equipment for insulation leaks

[2].

In the field of medical science and veterinary medicine infrared imaging is used to look

for diseases like breast cancer and arthritis, to conduct medical examinations for

whiplash, back injuries and carpal tunnel syndrome [2]. It is also used in dentistry and to

evaluate sports injuries and to monitor the recovery processes [2]. In veterinary medicine

it is used to check for injuries, stress fractures and lameness [2]. Thermal imaging helps

us in detecting sensations like pain, numbness, etc; which would not have been seen by a

normal image because these are problems in the functions of the tissues and not with the

structures [5]. Infrared imaging helps us in detecting alterations of the body’s workings

like pain and inflammation, nerve irritation and dysfunction, Angiogenesis (new blood

vessel formation), circulatory incompetencies and treatment efficacies. All these cannot

be detected by regular imaging [5]. Infrared imaging is used for assessing pain and

10

inflammation. In this it helps in assessing musculo-skeletal and articular pain, in

assessing the efficacy of chiropractic, osteopathic, physiotherapy, acupuncture and

myotherpy care [5]. It is used in assessing post injury trauma, in post surgery assessment

and in confirming the diagnosis of certain diseases [5].

The next application is in assessing nerve dysfunction and irritation. In this it helps us in

examining and correlating between the musculo skeletal finding and neurological

irritations or impairment and also in looking at suspected nerve entrapments [5]. It is used

to assess for reflex sympathetic dystrophy, complex regional pain disorder,

sympathectomies and nerve blocks [5]. In the case of investigating for angiogenesis

infrared imagery helps in breast imaging along with anatomical screening, also helps in

post skin cancer investigations and in assessing the level of acceptance or rejection after a

skin graft [5]. Lastly in the case of circulatory insufficiencies infrared helps in mapping

the varicose vein [5].

An infrared imaging system was reportedly used by many countries to detect passengers

entering the country with a high body temperature to guard against the entry of the SARS

virus [1]. Infrared imaging is proving to be a very useful tool in the hands of the law

enforcement professionals by helping them in stopping crime before it happens [LETA].

Since a thermal imager is capable of measuring very small temperature differences it

allows us to see in almost zero lighting conditions [LETA] by taking these images and

creating an infrared picture. Law enforcement authorities to catch criminals can use these

pictures. This ability of the thermal imager to help in preventing crime is recognized by

an association called the Law Enforcement Thermographers Association (LETA) and

they have 11 accepted applications of thermal imaging, which can help in crime

prevention [LETA]. These applications are accepted after any judgment passed by the

state or federal court accepts infrared images as evidence in a case [LETA].

Currently there are 11 applications of infrared imaging accepted by the LETA. They are

discussed in brief here.

11

• Hidden compartments: thermal imaging can help in detecting hidden

compartments in vehicles, which may be used for transporting illegal drugs,

contraband or even people. Since a thermal imager can detect any change in the

thermal characteristics of a surface caused by an adjoining wall or bulkhead it

will highlight these structural details invisible to the naked eye [LETA].

• Perimeter Surveillance: an infrared imager can help in the day and night

monitoring of highly restricted facilities and thereby help in spotting and

apprehending suspects who may be invading that secure area [LETA].

• Marine and Ground Surveillance: the night vision capabilities of a thermal

imager help in tracking during the night for both navigational as well as

surveillance purposes [LETA].

• Structure Profiles: the structure profile of a building obtained by using an

infrared imager will show the heat radiated by the building, and any unexpected

excessive radiation can help in checking for unwanted activities [LETA].

• Officer Safety: during night patrolling the officers can use infrared imagers to

look out for hidden suspects, guard dogs and other dangerous obstacles, which

are not visible to the human eye in the dark. Also they can do all this without

being exposed in the open. They can also use this to see through smoke and dust

[LETA].

• Disturbed surface scenarios: a surface may it be the earth surface or any other

artificial surface even though it is not apparently visibly disturbed will radiate

heat differently and so an infrared imager will help us in looking for hidden

compartments or floors [LETA].

• Environmental: air, water or soil pollutants radiate heat differently then their

surroundings and this difference in heat can be easily detected using an infrared

imager and the pollutants can be tracked back to their source [LETA].

• Flight safety: infrared imagers give night time vision to aircrafts thereby helping

them in detecting power lines and unlit landing sights and other such obstacles in

their pathway [LETA].

12

• Fugitive searches and rescue missions: living objects such as human beings

and animals are excellent radiators of heat hence infrared imaging can be used in

search and rescue operations to look for people who may be invisible to normal

vision due to optical shielding. The radiated heat is easily detected and the

suspect can be spotted from his hideout [LETA].

• Vehicle pursuits: vehicles radiate a lot of heat both while in use and even after

sometime. This heat shows up not only from the engine but also from the tires,

brakes and the exhaust. Using a thermal imager the police can spot a vehicle,

which may even, be driving with its headlights turned off to avoid being spotted.

Also a suspects car, which has just entered a parking lot, can be detected from it

heat emission [LETA].

13

1.3. Infrared Imaging based Tracking

While conducting a survey on the applications of infrared imaging and on human

tracking using infrared imagery we collected some useful papers on automatic target

detection and tracking using infrared imagery, face region tracking using infrared and on

3D tracking based on infrared cameras. Since these papers were closely related to the

main area of focus in our review we decided to document a review of these papers also in

the section that follows. We have divided this section into two parts. The first is on object

tracking which deals with automatic target detection and tracking and 3D tracking using

infrared imagery. The second is on face region tracking.

1.3.1 Object Tracking

In this section we shall deal with automatic target detection and tracking based on the

paper presented in [Braga-Neto 1999].

In [Braga-Neto 1999] they have proposed a method for automatic target detection and

tracking in forward-looking infrared FLIR image sequences. They have employed

morphological connected operators to extract and track targets of use and to eliminate

unwanted clutter. These operators are designed based on the criteria of general size,

connectivity and motion using the spatial intra-frame and temporal inter-frame

information. Connected operators are filters that do not modify individual pixel values

but instead act at the level of the flat zones in an image. A flat zone is a maximally

connected region of the domain of definition of an image with a constant gray level

value. Automatic target detection and tracking (ATDT) in forward-looking infrared

(FLIR) image sequences is very important for military applications. Firstly an image

sequence is filtered on a frame-by-frame basis and any background and residual clutter is

eliminated. Since now the presence of target if any is enhanced, a motion-based analysis

is conducted on the detected targets. This is accomplished exploiting the spatiotemporal

correlation of the data given in terms of a connectivity criterion along the time

dimension. From their experimental results they claim that their method is effective and

robust to a wide variety of targets and clutter variability. Even though the above

14

description might look really simple ATDT is difficult to accomplish due to the high

variability of targets and background clutter and the low spatial resolution of FLIR

images.

The main source of clutter in such images is sensor noise, natural background texture,

and human artifacts like buildings or other useless objects in the scene. The background

due to its inhomogeneity may contribute to high contrast edges, which may contribute to

a higher rate of false alarms. The technique presented by them gives an effective and

robust method for clutter suppression and normalization and is consistent over a wide

range of illumination conditions. The system has been designed with main focus on

reducing the rate of false alarms. The two-step FLIR ATDT algorithm presented in this

paper is as shown in the figure below.

Figure 2: The figure shows the block diagram of the FLIR ATDT algorithm implemented by Braga-Neto

et al. The picture has been obtained from [Braga-Neto 1999]

The algorithm is based on three basic and purely geometrical assumptions of the targets.

The assumptions are regarding size (i.e.) the targets of interest have a maximum specified

15

apparent size, relative position (i.e.) the targets of interest be situated away from the

boundary of the field of view and motion (i.e.) the targets of interest have limited relative

motion with respect to the FLIR sensor.

This algorithm uses a two-step procedure to process the video sequences. In the first step

(Intra-frame processing), they independently process the individual frames of an input

sequence. This is done to detect targets in individual frames based on contrast. This step

detects peaks (Hot targets) and dips (Cold targets) to obtain all the possible candidate

profiles and process them in order to land to the useful targets. Intra-frame processing

consists of background removal that is a process that uses reconstruction top-hat

operators to reduce background clutter and enhance the presence of targets. The second

process in intra-frame processing is of adaptive double thresholding. Adaptive double

thresholding is based on morphological reconstruction and is very robust. The process

finds the associated parameters adaptively for each frame. This process gives superb

ATDT performance over a wide range of image sequences. Adaptive double thresholding

has been used instead of simple thresholding because in simple thresholding large

difference in slicing values causes the image to be contaminated by clutter and small

differences result in an image with useful targets but they may be split into disconnected

regions. Since simple thresholding is very sensitive to chosen slicing values a slight shift

in these values may even completely eliminate these targets completely.

After having combined the results of binary detection of the previous step into one

sequence most of the false alarms and clutters that survive are removed in the second step

(Inter-frame processing). False alarms are eliminated by exploiting the spatiotemporal

correlations in the data, which are given in terms of a dilation based connectivity criterion

along the time direction. Inter-frame processing consists of 3-D labeling with dilation-

based connectivity; in this process they label the binary detections of the intra-frame step

so that the detections associated with the same target carry the same label. The second

process in inter-frame processing is of component filtering. This is done to eliminate

grains that are not consistently detected in the sequence. These grains are taken to be

16

missed targets, false alarms or targets that are moving too fast to satisfy the motion

criterion. They make an assumption that a valid target should be detected in at least m

consecutive frames. Once the sequence has been labeled by the previous step the grains

with similar labels that do not appear in m consecutive frames are discarded.

1.3.1 Face Region Tracking

Another rather popular use of infrared imagery is in face detection and in tracking the

facial regions of a human being as these find applications in face recognition systems,

human-computer interaction and in video surveillance systems. Many systems for face

region tracking have been implemented using visual imagery but as mentioned before all

of these suffer due to illumination variations in the scene being imaged and changes in

skin color of the person. Since an infrared image is representative of emitted light and not

reflected light (as in a visual image) it does not suffer from these pitfalls and so allows

them to provide useful images in almost any lighting conditions [Eveland 2001].

In [Eveland 2001] a three part human face region-tracking system using a thermal IR

sensor has been presented. Firstly a method for modeling thermal emission from human

skin, which can be used for the purpose of segmentation and detection of human faces in

infrared imagery, is shown. Then human heads are tracked over a period of time by

applying segmentation models to a condensation algorithm. Lastly they have evaluated

the use of tracking results to improve the segmentation procedure. In the first part (i.e.)

modeling the skin in thermal IR they classified pixels in an indoor scene as belonging to

three classes namely exposed skin, covered skin (either by clothing or by hair) and the

rest as background.

They chose infrared imagery because mid-to-long wavelength IR is emitted rather than

reflected which makes it an illumination invariant model. Also there is uniformity in the

emissivity values of skin for different members of the population. This means that such a

set up can perform equally well skins of all colors.

17

For the purpose of segmentation they have classified the image pixels in an indoor scene

as belonging to one of the three classes and create a probabilistic model from them. They

are exposed skin, covered skin (covered by hair or clothing) and the rest as background.

Figure 3: The figure shows left to right the probabilities of skin, covered skin and background in the face

region tracking system implemented by Eveland et al. The picture has been obtained from [Eveland 2001]

Once the segmentation is performed then they use it to track faces in the scene. They

model faces as arbitrarily oriented ellipses, with variable sizes and positions. The major

task in tracking faces for them is in selecting an element of the state space of all such

ellipses for each frame of the video at time t, (i.e.) they have to estimate a probability

density on the state space, encoding the likelihood that the tracked object is in a given

position. On this density a number of estimators can be applied to recover the single state

which will correspond to the object’s parameters. Using the MAP estimator they have

selected the state with the highest likelihood.

They found that calibrated images allowed them to use training data for tracking. They

felt that it was the process of calibration of infrared imagery which made those images

better suited for illumination invariant robust tracking and thereby giving infrared

imagery the advantage over visual.

18

1.4. Human Tracking

With an unprecedented increase in the concern for security issues Human tracking is

turning out to be a major area of research. So what is so great about tracking human

beings? Tracking humans can easily be considered to be amongst the most complex tasks

in the field of image processing and computer vision. Since this involves recognition of

different bodily shapes and colors hence there is no one single model that can help us

define all the features that we are looking for [7]. In simple terms no one model can

represent all the humans and so this is what makes the task all the more difficult.

Traditionally Human tracking has been done using visual sensors and a lot of work has

been done in this area but the problems that are usually encountered are that visual

sensors require proper lighting and suitable operating conditions. Since these sensors

basically capture the color, the shape and the texture details in a scene for you these

sensors are not illumination and pose invariant [7]. Since vision sensors give variable

response with variations in skin color, thermal sensors are now being seriously

considered for the purpose of Human tracking. Thermal sensors have some obvious

advantages to offer over the Vision sensors at least in this regard. The advantages are that

the characteristics of a thermal image for humans are uniform for nearly the entire

population [7] and without any doubt thermal sensors outperform vision sensors in

conditions of poor lighting and visibility [7].

1.4.1 Background on Human tracking

As might have been perceived by now from the above paragraph, is it then very easy to

accurately track humans in the dark especially during the night using an infrared sensor.

The answer is well yes it is possible to easily track a human in the dark since the heat

sensed from the humans and the background will be different and the difference will

distinctly show up [7]. But the task is not as straight forward because we will have to

19

device an algorithm that takes into account other sources of heat like objects other than

human beings [7]. Another issue that needs to be addressed and taken care of is the

problem of occlusion [7]. So how do we then measure the performance of a human

tracking system? The main goal of a human tracking system is to accurately detect and

track the presence of human beings in a given field of view in an extremely cost-efficient

manner. The overall setup laid down for implementing the system should be economical.

Complicated computational algorithms, high cost of equipment and a high requirement

on bandwidth are the major factors contributing to the hardships in implementing a cost

(computational cost, power requirements and cost of equipment) effective human

tracking system.

In the survey conducted on Human tracking techniques we have seen that the task can be

accomplished either by employing a single sensor or by employing an array of sensors. It

is observed that a detection system designed based on an array of sensors has certain

inherent advantages with regards to computational efficiency and accuracy of tracking.

An infrared sensor array based tracking system generally consists of many low power

requirements, low cost, low-resolution cameras (COTS) and since these sensors are so

closely spaced hence the bandwidth requirements are also low. Hence such a design

accounts for a more efficient system. Also a single sensor based tracking system has a

limited field of view and this makes the placement of the sensor a very critical issue. A

sensor array based tracking system is better equipped to track objects than a single sensor

system since they can localize the motion of the object. Such a system basically locates

any object motion detected by employing a large number of networked sensors and then

by using a technique like triangulation tries to locate the object more accurately then can

be possibly done using a single sensor system.

The major techniques that are usually employed in the process of tracking humans are

motion detection, background subtraction and template matching. All the above

techniques typically help to cover a broader field of view and get sufficient footage to

allow the system to decide if the object being tracked is a human being or something else.

20

One of the important tasks that need to be accomplished for tracking humans is to detect

human motion. Human motion is usually simply detected using background subtraction

algorithms. Here we firstly capture a sequence of frames in one particular field of view

with the camera fixed in one position and then compare these frames with a pre-modeled

background image of that same field of view. Such a technique works well if the

background does not change. However, if the background is also changing, then such a

method is not feasible. In such a case human motion detection cannot be accomplished

using elementary background subtraction techniques.

The field of view (Region of interest) in which motion is detected must be checked for to

confirm that the object that has caused motion is a human being only. This can be

accomplished using the process of template matching. The object is compared with a

template bearing some features that characterize humans. These features could be shape,

texture or temperature of the body surface. Such features can help in classifying with a

good degree of accuracy that the object that caused motion was a human. Having decided

that the object is a human based on the features of the template next we need to

continuously track that human.

Human tracking finds applications in search and surveillance operations where a sensor

network may be located strategically in a secure area to track humans. Such a network

would ideally detect the presence and movement of humans in that area which is meant to

be guarded and track any person entering the field of view and then raise an alarm on

intrusion. More advanced and sophisticated versions of similar systems can be employed

in crowded places (i.e.) in areas with high population since here we shall encounter the

problem of changing backgrounds and so such a system will need a complex algorithm.

Human tracking also finds application in big shopping centers and big buildings where

we can track people along lanes, aisles and hallways. People exhibiting unusual and

particularly suspicious behavior can be looked for and their activity can be tracked.

21

1.4.2 Survey on Human Tracking techniques

In this section that follows below we shall present a review on the survey that we

conducted on Human Tracking techniques employed by people until now. Human

tracking is of great importance in both search and surveillance operations as it helps in

guarding secure locations by keeping an eye on the activities conducted by the people in

a certain field of view [7]. Since nowadays Human tracking systems are being heavily

employed in areas with high pedestrian population like at the airports and at the railway

stations which are areas with increased security risks we cannot afford to have a human

being to observe the full video footage and look for defects in the image sequences [7].

This is because such a task requires concentration over long periods of time and humans

are not well suited to perform such a task efficiently. Hence a computer automated

process for Human tracking is necessary.

A heterogeneous network of infrared motion detectors and an infrared camera for the

detection, localization, tracking, and identification of human targets is implemented in

[Feller 2002]. The network employs a large number of low cost motion sensors for target

tracking along with a small number of image sensors for image registration. Networks

like these presumably find applications in local and distributed perimeter and site

security. Such networks can be designed to have a serial, a parallel or a tree topology

along with an ad hoc organizational structure [Feller 2002]. There are also some

statistical models that have been devised for ascertaining optimal network configuration.

To make such systems more and more robust we need to use highly specialized

components, which are expensive both to deploy and to maintain. For setting a large

network cost seems to be a prohibitive factor thereby requiring cheaper alternatives. Thus

the paper in [Feller 2002] tries to give us an insight into setting up a sensor network using

current off-the shelf (COTS) components. It tries to device a network, which combines a

large number of low cost motion sensors with a small number of high resolution imaging

components. While developing such a network the primary considerations were on low

power consumption to make the system self sufficient and long lasting, on use of COTS

technologies to reduce the cost, on wireless communication to avoid all the mess of

22

wiring and to increase reliability and on the use of infrared sensing to allow it to work in

a variety of lighting conditions. Since the sensors cost a lot less compared to the cost of

communication and the cost of computational components hence by using a dense

distribution low bandwidth sensors they tried to reduce the computational requirements

and thereby the cost. Also a lesser number of high-resolution cameras were required since

the low-bandwidth sensors were allowed to characterize the entire environment.

The sensor network was based on the concept of using a large number of relatively low

cost sensors to analyze the environment than to use a single expensive sensor, which

gives lesser details. For the purpose of detection, localization and tracking of human

targets 22 motion detectors spread across the imaging space with known location and

orientation were used. The diagram in figure below shows a logical representation of the

sensor network.

Figure 4: shows a logical diagram of the sensor network. The picture has been obtained from [Feller 2002]

The central node received all the sensor data, which it fused to extrapolate the location of

the events in the environment. Based on the location information computed on the control

node the high-resolution camera gets focused in the direction of the location to perform

identification. The location data thus obtained from the control node and the IR camera

are then transmitted to the host computer. The sensor network design supports 256

uniquely identified motion sensors, which once placed in the field, remain in standby

mode waiting to detect motion. The sensors used were made up of two adjacently placed

23

pyroelectric diodes. It functioned in such a way that if the first diode was activated and

then within a certain period of time if the second diode happened to trip then the sensor

would report a motion. The sensors had a certain waiting time before it reported a new

detection and this was done to avoid repetition. This time could be adjusted for each

sensor based on its location in the network. The sensor which detected the motion would

then send an identification signal to the central command to help it in locating its position

and this central node would then use the asynchronously received data from the triggered

sensors along with the previously determined sensor orientation and location to

extrapolate the location of the target. The infrared camera and its control systems are only

used as passive observers in this system and they continuously focus on the detected

target based on the space coordinate information provided by the control node.

After the motion had been detected they had to decide the location of the source. The

probability of detecting motion increased with increase in the density of sensors in the

network. The regions with the highest probability of containing a target could be

ascertained using a two-dimensional back propagation algorithm based on the prior

knowledge regarding the location of the sensor, its orientation and its field of view. When

a motion would be detected then the pixels corresponding to the field of view of the

sensor would be incremented. If more than one sensor detected the same motion then the

intensity of the pixels in the same field of view would be high as compared to detection

by a single sensor. The highest intensity pixel values were the areas with the greatest

probability of locating the targets. These coordinates of the highest intensity points help

in determining the camera angle. The back-propagation algorithm was employed for its

flexibility and extensibility. That is, it can easily accommodate the addition or removal of

sensors to the network and it only requires the orientation and location parameters of the

new or shifted sensors to be modified in the current system. Since each sensor had a fixed

field of view hence they could setup a large number of diverse sensors without much

change to the system.

24

Also using the back-propagation algorithm they could weight the importance of the

sensors and create a network in which the focus was more on certain areas than the

others. As the target moved through the region of coverage it was required for the pixel

map to reflect these changes by increasing the importance of that space at the cost of the

previous space. This was achieved by constantly reducing the intensity of the pixels in

the map over change in time. Hence when a person moved from one region of the map to

another, the intensity in the area in which motion was detected increased and hence at the

same time since that person left the old region the algorithm would fade the intensity of

the old region and increase the intensity in the new region. This was achieved by

updating the map using inputs from the sensors at predefined time intervals.

Figure 5: shows the layout of the sensors in the area of surveillance. The picture has been obtained from

[Feller 2002]

25

Figure 6: The figure on the left shows a graphical interface showing a target located at top left of the space,

the figure on the right shows an infrared image of the target in the imaging space. The pictures have been

obtained from [Feller 2002]

The targets were immediately detected on entering the field of view of the sensor

network, and the infrared camera almost instantly focused in that direction. The camera

could only detect the exact location of a target once more than one sensor had detected

motion. The network was capable of handling changes in motion anywhere in the

network environment. When there was target detection by multiple sensors the camera

was able to focus most of the target in its field of view, even though the image was not

centered. The network can sense motion by any target but it is not capable of detecting

the number of persons that cause this motion.

In [Nakamura 2001] a comparison of the pros of cons of ordinary video method and

infrared video method of tracking passenger movement is presented and a comparison

between the two is done. They feel that while using video method for tracking humans

incase two or more passengers happen to cross each other then their images would

overlap and this would present obstacles in separating the individual trajectories of each

passenger, these they believe are inherent pitfalls of the video method due to the

limitations in position and viewing angle of the camera. Hence they feel that using an

infrared video camera they could track passengers more effectively and also overcome

the problem of crossing since in infrared they could detect hot surfaces such as the human

face. An experiment was setup in which both video data and infrared video data were

collected in a hall where an event was being held. Passenger movement was tracked for

close to three hours.

This data was used for background abstraction. Another experiment was conducted to

create different situations (passengers’ crossing each other, affects of shadows, different

lighting conditions and different climatic conditions) that could be encountered to

examine the possibility of infrared video. Firstly the background images were obtained

26

and then these were subtracted from the original data so that they could only track the

moving data. A movie clip of data for both a video sequence and an infrared video

sequence were collected. From these video sequences still frames of resolution 720X480

were extracted, they were in BMP format. The frame rate was 30 fps hence every second

of video generated 30 frames.

Figure 7: The figure on the left shows each pixel of the extracted BMP frame. The figure on the right

shows the histogram of the frame. The peak represents the background in the image. The pictures have

been obtained from [Nakamura 2001]

Next pixel data in every pixel was counted and a histogram was plotted. The value of

each pixel is the average of the values of the contribution made by each color channel R,

G and B. The peak value in this histogram was taken as the value corresponding to the

background and by subtracting this value they tried to obtain the moving data. Next the

obtained moving objects were labeled. They found that in ordinary video method the area

near the frame, where the passengers did not emerge was clearly abstracted. Since the

video was of short duration people who did not move were taken as background. Some

white noises were observed since the man who was focused on stopped for a while in the

center of the image. The video and infrared images obtained after subtracting the

background image are as shown in figures below.

27

Figure 8: The figure shows images obtained after subtracting background image (ordinary video). The

pictures have been obtained from [Nakamura 2001]

In ordinary video it was seen that the important factor was the shading contrast of both

the background and the moving objects. In cases when the contrast is not good

binarization is not easily possible. This is not a big problem in infrared video since

shading is a function of temperature, and hence setting the threshold and binarization is

easy by selecting the value of the skin (face region). The only problem that could be

encountered in this method is it is difficult to abstract objects when there are similar heat

sources in the background.

Figure 9: The figure shows images obtained after subtracting background image (infrared video). The

pictures have been obtained from [Nakamura 2001]

In [Fang] they have developed a New Night Visionary Pedestrian Detection and Display

System. This is an infrared video based human detection system for tracking pedestrians

at the night (in the dark) on the road. They have implemented a two-step static pedestrian

segmentation algorithm. Firstly the regions of interest are segmented from the rest in the

infrared images. This task is achieved easily in infrared images since the humans are

heat-radiating bodies and hence they exhibit higher intensity values on the image and so

28

segmentation is performed around the hot spots in the image. There are some errors that

can be encountered in this process. They arise due to similar or more heat emission by

objects like cars, light poles and human heads. The next step is to eliminate these

segmentation errors using similarity feature comparison using a template for identifying a

pedestrian.

Figure 10: The figure shows the results of the first step of image segmentation obtained by Fang et al. The

first two images were the results of initial segmentation and the next three images were the results of

tracking. The pictures have been obtained from [Fang]

Figure 11: The figure shows the results of the second step of segmentation obtained by Fang et al. The first

image is the original infrared image, the second one is the edge map of the original image and the last

image is obtained after applying some morphological operators on the second. The pictures have been

obtained from [Fang]

In [Xu 2002] a method for pedestrian detection and tracking using a night vision video

camera installed on a vehicle has been given. To handle the complex shape of the human

body two step detection and tracking approach has been discussed. Detection is achieved

using a Support Vector Machine, which employs size normalized pedestrian candidates.

Tracking is accomplished using Kalman filter prediction method and the process of mean

shift tracking. The road detection module helps in the detection phase by providing useful

29

information for the identification of pedestrians. The human body parts appear as hot

spots in infrared video and in this paper using SVM humans are detected in infrared

images, then using an estimated possible pedestrian size such image regions are looked

for to classify as pedestrians and non-pedestrians. After this the tracking of the

pedestrian’s heads or bodies is accomplished applying Kalman filtering prediction

algorithm and mean shift algorithm.

As mentioned above the algorithm employed in [Xu 2002] has two stages. The detection

stage in which candidate selection and pedestrian verification is done using SVM. In the

tracking stage a Kalman filter is used to predict the approximate position of the

pedestrians and then the mean shift method is used to determine the exact location of the

pedestrians. The hotspots in the infrared videos are detected using a dynamic threshold of

each frame. The threshold selected by them is:

Threshold = 0.2Mean intensity + 0.8White intensity

This threshold has been applied on histogram-equalized images for segmentation of

hotspots; the noises encountered in the process of segmentation are suppressed by

performing morphological operations. Segmented hot spots are labeled and then

identification is performed using certain criterions based on sizes and probable areas of

pedestrian location. Candidates were selected based on one of the two methods: hotspot

candidate (size estimation using the size of hotspots) or body-ground candidate (size

estimation using distance between the ground and the top of the hotspot).

The classification method used by them based on Support Vector Machine (SVM)

estimates the decision boundary between two sets of high dimensional vectors and then

employs these as support vectors to classify data from a similar source. They estimated

the effectiveness of a gray scale pedestrian candidate in comparison to a binary

pedestrian candidate and felt that for minor differences in training data and testing data

gray scale candidate detection worked well while binary candidates were highly shape

sensitive and so the detection rate was low.

30

Figure 12: The figure shows the results of comparison between gray scale data on the left and binary data

on the right obtained by Xu et al. The pictures have been obtained from [Xu 2002]

The results of the performance comparison between hotspot candidates and body-ground

candidates were that both had a similar detection ratio but hotspot was a faster, efficient

and a robust technique. The next experiment that they conducted was to classify the

training set into three types of pedestrians: along-street pedestrian, across-street

pedestrian and bicyclist. Then testing was done using two techniques, in the first method

a single classifier was applied to all the pedestrians. This was found to be a slow and a

lengthy process. The second method incorporated multiple classifiers each for a specific

type of candidate. This method gave more positive results and reduced training time and

the size of the support vectors but due to too many classifiers the system went slow.

Figure 13: The figure shows the results of comparison between positive samples for hotspot candidates on

the left and the positive samples for body-ground candidates on the right obtained by Xu et al. The pictures

have been obtained from [Xu 2002]

31

Figure 14: The figure shows the three types of pedestrian classes considered Along-street, Across-street

and Bicycle. The pictures have been obtained from [Xu 2002]

After having detected humans in the infrared videos they had to track them. They used

the human head for this purpose since it was a hotspot and its shape did not change

drastically between frames. Two methods were employed to do this. The Kalman filter

method was employed using the below mentioned equations to update the time related

parameters for each frame.

The time update equations used were: -

Priori positions: S 1−− Φ= kk S

Priori Measurements: QPP Tkk += ΦΦ −

−1

The measurement update equations used were: -

Kalman gain: 1)( −−− += kTkkkkkk RPP HHHK

Posteriori positions: )( SHZKSS kkkkkk

−− −+=

Posteriori measurements: −−= kkkk PP HKI )(

Here and are the estimated positions at time k-1, k and at time k before

updating with the error between and respectively. Similarly and are

error covariance for the current parameters for time k, k-1 and estimated parameters at k

respectively. is the transform matrix from and . Q is the model error and

1, −kk SS −kS

kS kZ 1, −kk PP −kP

Φ 1−kS −kS

Z k is the measurement at time k. H k is the noiseless connection between the

measurement Z k and the position at time k. LastlykS K k is the Kalman gain or the

32

blending factor that minimizes . kP

The head position in a new frame could have been estimated using the information from

previous frames. But since the pedestrian movement was not linear they employed mean

shift method to find the accurate position around the posteriori position. The equation

implemented was as given below.

∑∑

−

−=

Ss

Ss

swxsK

sswxsKxm

ε

ε

)()(

)()()(

Here x is the current position, w(s) (ratio of original gray scale level to current gray scale

level at s) is a weight function and m (x) is the new position. K is the Kernel given by

2

41)(

x

exK−

=π

The method implemented by them in [Xu 2002] could track multiple pedestrian bodies

simultaneously in real time. It was found that detection was a time consuming process

compared to tracking. Also tracking was much more robust since almost no detected

target was lost however there could be losses at the detection stage. The shortcoming of

tracking was that it only tracked detected targets and did not look for new persons. Hence

they chose to set detection after every 5 frames or after the tracked target are lost. Thus

the system claims to have incorporated the robustness of tracking and the ability of

detecting new individuals using interleaved detection stage.

Figure 15: The figure shows the results obtained by Xu et al. detection stage on the left where a circle

denotes a hotspot of the face and a square in the face region in the right figure shows the results of tracking

the heads that are detected. The pictures have been obtained from [Xu 2002]

33

It was found that a single classifier performed better as compared to multiple classifiers.

The detection rate was not high but the even distribution in time of detected frames meant

that almost at all times the pedestrian would be detected within a short time span. This

allowed them to interleave the detection process with the tracking procedure. They laid a

lot of stress on the time it took to detect a human as compared to the number of frames in

which detection was accomplished because all the detected humans were successfully

tracked.

In [Nanda 2002] a real time pedestrian detection system employing probabilistic

templates to detect the different shapes of the human body and one that works on low

level infrared videos has been presented. The infrared videos help to segment the region

of interest and then the template is used in identifying pedestrians. They have used the

raw data (i.e.) intensity values of each pixel in the preprocessed image to classify the

region as a pedestrian or a non-pedestrian. This method is used because they believe that

since the system uses low level infrared video which gives images corresponding to the

amount of heat radiated by the body parts and this amount of heat radiated varies

depending on the part of the body, clothes worn, pose and also the state of mind. Hence

the intensity variations will be large over a full body region and so neighboring pixels

will not be connected. This is why they did not choose a region based approach. They

also felt that presence of noise, low contrast and ghosting effects in the image would

almost rule out the use of edge-based representation.

Using the raw pixel data, targets were extracted by employing elementary thresholding

procedure. Then using a training data consisting of 1000 rectangular boxes containing

pedestrians they firstly calculated the mean and the standard deviation for both pixels in

the pedestrian region ( )11 σµ and and pixels in the background region ( )22 σµ and . Then

employing Bayesian classification technique in which the apriori probabilities for the

pedestrian region and background region are assumed to be equal and a Gaussian

distribution the threshold is set. The equation used is as shown:

34

21

1221

2

1

21

21 )ln(σσ

µσµσσσ

σσσσ

++

++

=Threshold

;1),( =yxth

;0),( =yxth

The thresholding technique that is employed is as given below

if image(x, y) > threshold

if image(x, y) <= threshold

The resultant image was a binary image in which only the pedestrian was seen and the

background was eliminated.

Next a probabilistic template was developed. They used a training dataset consisting of

1000 (128 X 48) rectangular images all of which had humans of same height but with

different poses and orientation. Thresholding was performed on the template so that the

model did not learn intensity variations in both the background and the foreground pixels.

Then each template was shifted so that the centroid of the non-zero pixels exactly

matched the geometrical center of the image. Then for each pixel of the template the

probability of it being pedestrian was calculated based on the frequency with which it

appeared as intensity value 1 in the training data.

During the process of pedestrian detection what they did was using a probabilistic

template and a test window of size 128 X 48 they estimated the probability that the

window had a pedestrian. They argued that with prior information that the window

contained the pedestrian the probability of correct classification for each pixel of intensity

value 1 was p(x, y) and it was 1- p(x, y) for pixels with intensity value 0. Using this logic

they calculated the probabilities for all the pixels and obtained the combined probability

that a given window with given prior would contain a person by summing all the

individual probabilities. They assumed that the intensity value at a point was independent

of its neighbors.

35

Figure 16: The figure shows the probabilistic template developed and used by Nanda et al. for detecting

pedestrians. The picture has been obtained from [Nanda 2002]

The equation that was used to calculate the combined probability is as shown below.

∑== −−+=

128::148::1 ))),(1(*)),(1(),(*),((),(

yx yxpyxthyxpyxthjiobabilitycombinedpr

Here was a 128 X 48 window around a pixel (i, j). After calculating the combined

probability a probability map was obtained. The mean and the standard deviation of the

combined probability were calculated for all the 1000 training samples as well as for the

1000 (128 X 48) windows that do not contain pedestrians. This was followed by

thresholding the probability map.

th

The system was implemented using 3 different sizes of probabilistic templates each of

which were created using 1000 different pedestrian templates. It was found that the

template worked fine even on people who were 25% of scale. The implementation was

found to be robust to noise and occlusions.

36

Figure 17: The figure shows the results obtained by Nanda et al. the pictures on the right are the input

frames and the pictures on the right are the respective output frames. The blue contours in output frames

indicate human heads. The pictures have been obtained from [Nanda 2002]

They were able to track multiple targets at the same time (simultaneously) in real time.

In [Haritaoglu-I 1998] a real time visual system for detecting and tracking people and

monitoring their activity in the open has been presented. The system is called “W4: Who?

When? Where? What?” and works on monocular gray scale video images or on infrared

images. It uses a combination of shape analysis and tracking to locate people and their

body parts like head, hands, feet or torso. It also creates a model of the appearance of

people so that tracking can be achieved through interactions like occlusions. The system

is also capable of tracking multiple targets at a time even with occlusion. The system

constructs a dynamic model of people’s movements to answer the questions what, where

and when and it constructs an appearance model of people to answer the question as to

37

who is being tracked. W4 tries to overcome the inherent errors (like instability in

segmentation process over time, object splitting due to overlapping of similarly colored

background regions), which are encountered in dynamic image analysis.

The system detects foreground region in each frame by combining background analysis

with simple low level processing of the resulting binary image. The background is

modeled statistically by using the minimum and maximum intensity values and the

maximal temporal derivative for each pixel, which are recorded over some period of

time. These values are estimated over several seconds of video and are then updated over

a fixed period of time after the system has ascertained that there is nobody in the

foreground. Using the background model each pixel is classified as belonging either to

the background region or to the foreground region.

Figure 18: The figure shows the process for motion estimation of body using Silhouette edge matching

between two successive frames employed by Haritaoglu et al. The pictures from left to right show the input

image, the detected foreground region, alignment in silhouette edges based on difference in median and

final alignment after silhouette correlation. The pictures have been obtained from [Haritaoglu - I 1998]

Any pixel x from the image I belongs to the foreground region if and only if:

)()()( xDxIxM >− or )()()( xDxIxN >− ;

Where M is minimum, N is maximum and D is the largest inter frame absolute difference

image that represents the background scene model. For segmenting the objects in the

foreground from the background in each frame they first threshold the image, then

perform noise cleaning on it by applying one iteration of erosion to foreground pixels,

followed by morphological operations like erosion and dilation and finally perform object

detection. According to them since striking a satisfactory combination of erosion and

dilation for outdoor images is a difficult task they apply morphological operators to

foreground pixels only after the process of noise elimination. In effect the system

38

reapplies background subtraction followed by a single iteration of dilation and erosion

only to those areas identified as foregrounds. Lastly a binary connected component

analysis is applied to the foreground pixels to uniquely label each foreground object. The

system has been made capable of tracking objects even in the event that its algorithm did

not segment people as a single foreground object. This is anticipated in cases like

temporary occlusion or if the object has been split into pieces. In such an event the

system uses local correlation techniques to attempt to track parts of the interacting

objects.

Figure 19: The figure shows an example of how temporal templates are updated over time. The pictures

have been obtained from [Haritaoglu - I 1998]

Other than tracking the human body as a whole the system also wants to locate body parts

such as head, hands, torso, legs and feet and understand the actions they undergo. W4

accomplishes these tasks by utilizing its shape analysis and template matching

techniques. This is the best method to track body parts when some of the parts of a

human body are occluded and the shape is not predictable. The shape model has been

implemented by them using a Cardboard Model which represents the relative positions

and sizes of body parts. The cardboard model along with second order predictive motion

models of the body can be used to predict the position of humans in different frames of a

video sequence. This cardboard model used is representative of person who is in an

upright standing pose. It is used to track the body parts like (head, torso, feet, hands,

legs). Firstly the pixels inside the boxes which are applied are used to calculate the

principal axis which helps us in estimating the pose of the body parts that are being

tracked. The head is located first, followed by the torso and legs, then the hands are

located and finally the feet are located as end regions. After predicting the positions they

are accurately confirmed by using temporal texture templates.

39

Figure 20: The figure shows an example demonstrating the use of cardboard model to locate body parts of

humans using infrared imagery. The pictures have been obtained from [Haritaoglu - I 1998]

The W4 is a real time system and processes 20-30 frames per second (depending on

image resolution) on a dual Pentium processor. It is capable of tracking multiple people

at the same time against a complex background.

40

1.5. Calibration of infrared Cameras

In this section we shall very briefly discuss radiometric calibration of infrared cameras

and present our review on the commercial survey of black body calibrators conducted by

us. The output that an infrared camera gives is the sum of the radiation emitted by the

area that is being imaged, radiations emitted by the surroundings (background) and

reflected by the target, and radiation emitted by the atmosphere itself. Calibration is a

process in which raw data collected from infrared cameras is converted into a

standardized format so that an image captured with different cameras gives us the same

piece of information. The need for calibration is felt to obtain accurate temperature

information from the scene whose image we are having. For this purpose the infrared

camera needs to be calibrated, this can be accomplished either by using a reference

blackbody source of known temperature or by spectral radiometric calibration, against

calibrated reference detectors.

Radiometric calibration establishes a direct one to one relationship between the gray level

value response at a pixel and the amount of absolute thermal emission from the

corresponding scene element [Socolinsky 2001]. The above relationship is called

responsivity. Thermal emission is measured as flux and like power has the units of

(W/cm2). In case of LWIR cameras the gray level response of thermal IR pixels is linear

with respect to the thermal intensity of incident thermal radiation. The slope of the

responsivity curve is the gain and the y-intercept is the offset. The variations in offset and

gain are significant from pixel to pixel over an infrared focal plane array. In the process

of radiometric calibration images of a black-body radiator covering the entire field of

view are obtained at two known temperatures [Socolinsky 2001]. Next the gain and offset

is computed using the radiant flux of that black-body at a given temperature. For this we

need to have the Emisivity curve of that black-body. Emisivity is a function of

temperature and is given by Planck’s Law which states that the flux emitted at the

wavelength λ by a blackbody which is at a known temperature T is as given below.

41

)1(

2),(5

2

−=

kThc

e

hcTWλλ

πλ

Here h is Planck’s constant, k is Boltzmann’s constant and c is the speed of light in

vacuum. Thus the flux observed by a sensor is as given in equation below, where R (λ) is

the responsivity.

∫= λλλ dRTWTW )(),()(

This radiometric calibration lasts only in those surrounding conditions in which the

calibration was done (i.e.) if a camera was radiometrically calibrated indoors, taking it

outdoors in the presence of significant ambient temperature difference will cause the gain

and offset of linear responsivity of focal plane array pixels to change. Hence we would

require doing the radiometric calibration again. This effect is mainly due to the optics and

the heating up of the FPA which as a result causes the sensor to see more energy

[Socolinsky 2001]. Thus radiometric calibration tends to standardize all thermal IR data

collections, whether they are taken under different conditions or with different cameras or

at different times. In the process of radiometric calibration firstly the spectral response of

the system is obtained and then the radiometric calibration is done.

1.5.1 Survey on Black body calibrators for IR cameras

Objects which are not at a temperature of absolute zero radiate energy in the form of

electromagnetic (EM) waves. A system called blackbody absorbs all these radiations that

it receives and radiates back more thermal radiation for wavelengths of different intervals

covering the entire spectrum. There never exists an ideal blackbody; only some specially

developed laboratory sources emit radiation with up to 98% efficiency which makes them

comparable to a blackbody. The table given shows us a review of the survey conducted

on commercially available black body calibrators for infrared cameras.

42

Model Accuracy Target

size

Operating

Temperature

Range

Emisivity Power Dimensions Price

Omega

BB - 2A

±2% of rdg

(±5%)

6” (dia)

plate 100°F – 662°F 0.95

115Vac;

50/60Hz. 5” X 6.3” X 5” $650

Omega

BB - 4A

±1°C;

±0.25%;

(±1.8°F

±0.25%) rdg

0.88”(dia)

plate

212°F - 1800°F

0.99

115Vac;

50/60 Hz or

230Vac;

50/60 Hz,

400 W.

7.5” X 16.12” X

10.4”

$3595

Omega

BB – 701

±0.8°C +1

Digit (±1.4°F )

[worst case]

2.5” (dia)

plate

0°F - 300°F

0.95

BB701:

115V ac;

50/60 Hz,

175W

BB701-

230VAC:

230 V ac;

50/60 Hz,

175W

7.75” X 14.128”

X 15.5”

$2995

Omega

BB – 703

±1.4°C

(±2.5°F)

1.125” (dia)

plate

20°F - 752°F

0.95

BB703:

115Vac,

50/60 Hz

175 W

BB703-

230VAC:

230Vac,

50/60 Hz,

175 W

5” X 2.2” X 6.1”

$890

43

Omega

BB – 704

±0.8°C

(±1.4°F)

4” (dia)

plate

212°F - 752°F

0.95

115Vac,

50/60 Hz or

230Vac

50/60 Hz,

425 W

16.12” X 7.5” X

10.38”

$2495

Omega

BB – 705

±0.25% of

reading;

±1°C

1.75” (dia)

plate

212°F - 1915°F

0.99

115Vac,

50/60 Hz or

230Vac,

50/60 Hz

22.3” X 20.5” X

23.6”

$9995

Hotek

Model 988 ±0.3°F

2.76” (dia)

plate

70°F - 115°F

0.97±0.02

70 W

9.06” X 8.86” X

4.53”

_____

Nagman

BBSL

Within 0.5%

of indicated

temperature

with a

minimum of

3°F

0.49”(dia)

aperture

45°F - 1112°F

Better than

0.97

220V ac

± 10%,

50 Hz / 500 W

9.84” X 12.4” X

5.31”

_____

Nagman

BBSH

Within 0.5%

of indicated

temperature

with a

minimum of

5°F

0.98”(dia)

aperture

932°F - 2192°F

Better than

0.97

220V ac

±10%,

50 Hz / 1000W

14.76” X 12.2” X

5.91”

_____

Hart

Scientific

HR 9132

0.9°F - 212°F

±1.4°F -

932°F

2.25” (dia)

122°F - 932°F

0.95 ± 0.02 from 8 to

14µm

115V ac (±10%), 3 A or

230V ac (±10%), 1.5 A,

switchable, 50/60 Hz.

4” X 6” X 7”

$2570

44

Hart

Scientific

HR 9133

±0.25% of

reading;

±1°C

2.25” (dia)

–22°F - 302°F at 73°F ambient

0.95 ±0.02 from 8 to

14µm

115V ac (±10%), 1.5 A,

230V ac (±10%), 1.0A,

switchable, 50/60 Hz

6” X 11.25” X

10.5”

$3710

MIKRON

M340

±3°C

2” (dia)

aperture

–4°F - 300°F

0.99 (+ 0.005 –

0.000)

115V ac ±5%, 50/60Hz

300w max. (230VAC optional)

6.57” X 11.02” X 11.02”

_____

MIKRON

M310,

M315

±0.25% of reading ±1°C

3” (dia)

aperture

41°F - 662°F

0.99 (+ 0.005 –

0.000)

115V ac ±5%, 50/60Hz

300w max. (230VAC optional)

6.57” X 11.02”

X 11.02” _____

MIKRON

M320 (Dual Cavity)


3” (dia)

aperture

50°F - 570°F

0.99 (+ 0.005 –

0.000)

230V ac, ±10%, 50/60Hz

1.5kw max. (115VAC optional)

25.2” X 19.69”

X 21.65” _____

MIKRON

M305

±0.25% of

reading ±1

digit

1” (dia)

aperture

210°F - 1830°F

0.995 (+ 0.0005 –

0.0000)

115V ac ±10%,

50/60Hz, 1.0kw max. (230VAC optional)

10.63” X 16.93”

X 14.57” _____

MIKRON

M335

±0.4% of

reading ±1

digit

0.65” (dia)

aperture

570°F - 2730°F

0.99 (+ 0.003 –

0.000)

115V ac ±10%,

50/60Hz, 2.0kw max. (230VAC optional)

25.2” X 19.69”

X 21.65” _____

45

MIKRON

M330


1” (dia)

aperture

572°F - 3100°F

0.99 + 0.005 – 0.000

208 to 230Vac; ±10%, 50/60Hz

15kw

67.32” X 22.05”

X 32.28” _____

Table shows the review of the commercial survey of blackbody calibrators for infrared cameras conducted

by us.

46

1.6. Conclusions

We just completed A Survey on Infrared Imaging with main focus being on its use in

Human Tracking systems. Different applications of infrared imaging were studied. All the

approaches to human tracking using infrared imaging were studied reviewed. We also

looked at object tracking and face region tracking in infrared imaging. A commercial

survey on all the available black body calibrators for infrared cameras was presented.

1.6.1. Summary

We presented a brief review on all the major applications of infrared imaging after

looking into all the possible applications of infrared imaging and having conducted an in

depth survey on its use in human tracking systems. The other applications of infrared

imaging that were touched upon were motion detection, face recognition, pattern

recognition, intrusion detection, surveillance and others but we were very brief about

their details. The survey mainly focused on general object tracking and human tracking

techniques using infrared imaging. The different papers reviewed tracked humans using a

single sensor, or a sensor network. Human tracking was accomplished by first performing

motion detection and then performing template matching for confirming that the object

that caused motion was a human being. During tracking features like human face, head,

bust and silhouette were looked for. We also looked at some of the calibration techniques

for infrared cameras and presented a commercial survey on different available black body

calibrators for infrared cameras.

1.6.2. Future Work

Future tasks would involve designing our own infrared imaging based Human Tracking

system. For this we would have to design a motion detection algorithm, a template

matching algorithm for ascertaining that the motion was caused by a human being and

then a tracking system to continuously track the detected humans. We could borrow some

ideas from the past work to get started into our work and then improve upon them to

reach our own approach.

47

References

Papers

[Feller 2002] Feller Steven D., Evan Cull, David Kowalski, Kyle Farlow, John Burchett,

Jim Adleman, Charles Lin, David J. Brady, “Tracking and imaging humans on

heterogeneous infrared sensor array for tactical applications”, SPIE Aerosense 2002,

April, 2002.

[Nakamura 2001] Nakamura Masanobu, Huijing Zhao, Ryosuke Shibasaki, “Tracking

passenger movement with infrared video data”, Proc. ACRS 2001 - 22nd Asian

Conference on Remote Sensing, Vol. 2, pp. 1520-1523, Singapore, 5-9 November 2001.

[Fang] Fang Yajun, Ichiro Masaki & Berthold K. P. Horn, “New night visionary

pedestrian detection and display systems”, Artificial Intelligence Laboratory, MIT,

Cambridge, MA.

[Xu 2002] Xu Fengliang, Kikuo Fujimara, “Pedestrian Detection and tracking with night

vision”, Proc. IEEE Intelligent vehicles Symposium, Versailles, France, 18-20 June,

2002.

[Haritaoglu-I 1998] Haritaoglu Ismail, David Harwood and Larry S. Davis, “W4: Who?

When? Where? What? a real time system for detecting and tracking people”, Proc. Third

International Conference on Face and Gesture Recognition, pp. 222-227, Nara, Japan,

April, 14-16, 1998.

[Nanda 2002] Nanda Harsh and Larry Davis, “Probabilistic template based pedestrian

detection in infrared videos”, Proc. IEEE Intelligent Vehicle Symposium, Versailles,

France, 18-20 June, 2002.

48

[Eveland 2001] Eveland Christopher K., Diego A. Socolinsky, Lawrence B. Wolff,

“Tracking human faces in infrared video”, CVPR Workshop on Computer Vision beyond

the Visible Spectrum, Kauai, December 2001.

[Braga-Neto 1999] Braga-Neto Ulisses, Manish Choudhary and John Goutsias,

“Automatic target detection and tracking in forward-looking infrared image sequences

using morphological connected operators”, 33rd Annual Conference on Information

Sciences and Systems - CISS'99, Vol. I, pp. 173-178, Baltimore, MD, March 1999.

[Bodor 2003] Bodor Robert, Bennett Jackson, Nikolaos Papanikolopoulos, “Vision-based

human tracking and activity recognition” Proc. of the 11th Mediterranean Conf. on

Control and Automation, 18-20 June, 2003.

[Magneau 2002] Magneau Olivier, Patrick Bourdot, Rachid Gherbi, “3D tracking based

on infrared cameras”, Proc. International Conference on Computer Vision and Graphics,

Zakopane, Poland, September, 2002.

[LETA] The Law Enforcement Thermographers Association (LETA), “A Paper on 11

applications of infrared imaging recognized by them”

[Socolinsky 2001] Socolinsky Diego A., Lawrence B. Wolff, Joshua D. Neuheisel,

Christopher K. Eveland, “Illumination invariant face recognition using thermal infrared

imagery”, Proc. IEEE Computer Society Conference on Computer Vision and Pattern

Recognition (CVPR 2001), 2001.

[Xu 2003] Xu Fengliang, Kikuo Fujimara, “Human detection using depth and gray

images”, Proc. IEEE International Conference on Advanced Video and Signal based

Surveillance, Miami, FL, 21-22 July 2003.

[Haritaoglu-II 1998] Haritaoglu Ismail, David Harwood and Larry S. Davis, “Ghost: A

human body part labeling system using silhouettes”, 14th International Conference on

Pattern Recognition, Brisbane, pp.77-82, Australia, 16-20 August, 1998.

49

http://mha.cs.umn.edu/Papers/Vision_Tracking_Recognition.pdf

http://mha.cs.umn.edu/Papers/Vision_Tracking_Recognition.pdf

[Kim 2003] Kim Young-Ouk, Joonki Paik, Jingu Heo, Andreas Koschan, Besma Abidi

and Mongi Abidi, “Automatic face region tracking for highly accurate face recognition in

unconstrained environments”, Proc. IEEE International Conference on Advanced Video

and Signal based Surveillance, Pages 29-36, Miami, FL, 21-22 July 2003.

[Prokoski 2000] Prokoski F., “History, current status, and future of infrared

identification”, Proc. IEEE Workshop on Computer Vision Beyond the Visible Spectrum:

Methods and Applications, pp. 5-14, Hilton Head Island, SC, 16 June 2000.

Publications Bibliography:

Fujiwara Hideto, Makiko Seki, Kazuhiko Sumi and Hitoshi Habe, “The Vehicle Tracking

Method using Texture based Background Subtraction”, Proceedings of the 7th Symposium

on Sensing via Image Information, pp. 17-22, 2000.

Jones B., “Design of a remotely operated intrusion detection system for security

applications”, Proceedings of IEEE International Carnahan Conference on Security

Technology, pp. 145-153, 1993.

Nakanishi Yasuto, Kenji Oka, Masayuki Kuramochi, Shohei Matsukawa, Yoichi Sato

and Hideki Koike, “Narrative Hand: Applying a fast-finger tracking system for media

art”, 11th International Symposium on Electronic Art (ISEA 2002), 2002.

Proceedings of IEEE International Conference on Advanced Video and Signal based

Surveillance, Miami, FL, 21-22 July 2003. (IRIS Publications Resources).

Shunsuke Kamijo, Yasuyuki Matsushita, Katsushi Ikeuchi and Masao Sakauchi, “Traffic

Monitoring and Accident Detection at Inter sections”, UM3, 2000.

Srivastava Anuj, and Xiuwen Liu, “Statistical hypothesis pruning for identifying faces

from infrared images”, Journal of Image and vision computing, 21(7), pp. 651-661, 2003.

50

Sugimura Koji, Yasuo Suga and Junichi Tujitani, “Counting system of pedestrian”,

Proceedings of the 7th Symposium on Sensing via Image Information, pp. 357-362, 2001.

B. Maurin, O. Masoud and N. Papanikolopoulos, “Monitoring Crowded Traffic Scenes”,

Proceedings of the IEEE 5th International Conference on Intelligent Transportation

Systems (ITSC 2002), pp 19-24, Singapore, September 3–6, 2002.

C.R. Wren and A.P. Pentland, “Dynamic Models of Human Motion,” Proceedings of the

3rd IEEE International Conference on Automatic Face and Gesture Recognition, April

1998.

Websites

[1] Temperature sensor community website, “Applications of Infrared Imaging on

Temperatures.com”.

http://www.temperatures.com/tiapps.html

[2] A seminar presentation on the applications of infrared imaging, “Teamworknet Inc

website”.

http://www.teamworknet.com/ResourceLibrary/Presentations/IEEEThermalPresentation/

default.aspx#1

[3]. Infrared imaging based automated video security system, “Southwest Research

Institute website”.

http://www.swri.edu/4org/d10/autoeng/video/default.htm

[4]. Applications of infrared imaging, “Marlow Industries Inc. website”.

http://www.marlow.com/

[5]. Applications of infrared imaging, “Australian Thermal Imaging website”.

http://www.thermalimaging.com.au/index.html

51

http://www.temperatures.com/tiapps.html

http://www.teamworknet.com/ResourceLibrary/Presentations/IEEEThermalPresentation/default.aspx

http://www.teamworknet.com/ResourceLibrary/Presentations/IEEEThermalPresentation/default.aspx

http://www.swri.edu/4org/d10/autoeng/video/default.htm

http://www.marlow.com/

http://www.thermalimaging.com.au/index.html

[6]. A real time infrared tracking system for virtual environments, “ERCIM official

website”.

http://www.ercim.org/publication/Ercim_News/enw53/foursa.html

[7]. Reasoning and sensing for visual and infrared data, “ECE 573 Fall 2003 internal

webpage of Balasubramanian L”.

http://www.imaging.utk.edu/classes/fall2003/modsen/ece573/bala/index.htm

[8]. Infrared Imaging in Modular Multipurpose Multi-sensor Robot, “Nikhil Naik's ECE

573 Fall 2003 internal webpage”.

http://www.imaging.utk.edu/classes/fall2003/modsen/ece573/nikhil/webtemplate/index.ht

m

[9] Black body calibrator’s manufacturer, "Omega Engineering Inc. website".

http://www.omega.com/toc_asp/subsectionSC.asp?subsection=K02&book=Temperature

[10] Black body calibrator’s manufacturer, "Hotek Technologies website".

http://www.hotektech.com/Isocomp.htm

[11] Black body calibrator’s manufacturer, "Nagman website".

http://www.nagman.com/body.asp

[12] Black body calibrator’s suppliers, "Davis Inotek Instruments website".

http://www.davis.com/showpage.asp?L3ID=1244

[13] Black body calibrator’s manufacturer, "Mikron Institute website".

http://www.mikroninst.com/literature/blackbody.pdf

52

http://www.ercim.org/publication/Ercim_News/enw53/foursa.html

http://www.imaging.utk.edu/classes/fall2003/modsen/ece573/bala/index.htm

http://www.imaging.utk.edu/classes/fall2003/modsen/ece573/nikhil/webtemplate/index.htm

http://www.imaging.utk.edu/classes/fall2003/modsen/ece573/nikhil/webtemplate/index.htm

http://www.omega.com/toc_asp/subsectionSC.asp?subsection=K02&book=Temperature

http://www.hotektech.com/Isocomp.htm

http://www.nagman.com/body.asp

http://www.davis.com/showpage.asp?L3ID=1244

http://www.mikroninst.com/literature/blackbody.pdf

Documents

A Survey on Applications of Thermal Imaging with special