Upload
others
View
14
Download
0
Embed Size (px)
Citation preview
A Survey on Infrared
Imaging with focus on Human
Tracking.
Nikhil Arun Naik
ECE 671
SPRING 2004
Table of Contents
Abstract………………………………………………………….i
1.1. Introduction……………………………………………… 01 1.1.1 Motivation.……………………………………………………….. 03
1.1.2 Mission….………………………………………………………... 04
1.2. Infrared imaging…………………………………………. 05 1.2.1 Overview...………………………………………………………... 06
1.2.2 General Applications……………………………………………... 08
1.3. Infrared imaging based tracking………………………... 14 1.3.1 Object tracking.....………..……………………………….……… 14
1.3.2 Face region tracking……………………………………………... 17
1.4. Human tracking using infrared imaging.......................... 19 1.4.1 Background on human tracking…..……………………………… 19
1.4.2 Survey on human tracking techniques……………………….…… 22
1.5 Calibration of infrared cameras…………………………. 40 1.5.1 Survey on Black body calibrators for IR cameras ….……………. 41
1.6. Conclusions.…..................................................................... 47 1.6.1 Summary........................................................................................... 47
1.6.2 Future work…………….……...……………………….…………. 47
2
References…................................................................................ 48 Papers
Bibliography
Websites
3
Abstract
Our project is A Survey on Infrared Imaging with main focus being on its use in Human
Tracking systems. We will present a brief review on all the major applications of infrared
imaging after looking into all the possible applications of infrared imaging and doing an
in depth survey on its use in human tracking systems. The other applications of infrared
imaging that would be touched upon would include motion detection, face recognition,
pattern recognition, intrusion detection, surveillance and others but we shall be very brief
about their details.
The survey would mainly focus on general object tracking and human tracking
techniques using infrared imaging. The human tracking systems other than performing
the task of continuously monitoring the movements of a person would allow us to locate
people, detect humans and all other objects in a given field of view. Humans would be
basically tracked for by looking out for features like their face, head, bust and silhouette.
Looking out for changes in temperature over a certain region of area over a certain period
of time would help. We would also be looking at some of the calibration techniques for
infrared cameras and present a commercial survey on different available black body
calibrators for infrared cameras.
i
1.1. Introduction
As is clear by now from the title and the abstract this project deals with “A Survey on
Infrared Imaging with main focus being on its use in Human Tracking systems”. Infrared
imaging is a sub system of the vast field of image processing which is fast developing
with an increased scope in the coming years due to excessive focus being laid on security
systems. What is Infrared? [2] Infrared is a band of energy in the 2mm to 100mm
wavelength range in the electromagnetic spectrum [2]. The visible spectrum lies only in
the range of wavelengths from 0.4mm to 0.7mm the band of energy above this in the
electromagnetic spectrum is the infrared spectrum and the band of energy below the
visible spectrum is the ultraviolet spectrum [2]. Infrared light behaves very much similar
to the visible light [2]. Infrared light travels at the speed of light (2.988 X 108 m/s) and
just as visible light it to can be reflected, refracted, absorbed and emitted [2].
An infrared image is a pattern generated proportional to a temperature function
corresponding to the area or the object that is being imaged [2]. An infrared image is
obtained based on the principle that vibration and rotation of atoms and molecules in an
object causes the object to give out heat which is captured by an infrared sensor to give
us an image [2]. The Stefan – Boltzmann’s law tells us that infrared power of an object is
directly proportional to the 4th power of the absolute temperature of the object; hence we
can infer from this that the output power of the object would tend to increase very fast
with increase in absolute temperature of the object [2].
The infrared imaging spectrum can be broadly divided into two ranges of wavelengths
[2], the mid wavelength band infrared (MWIR) has an energy spectrum in the 3mm to
5mm range and the long wavelength band infrared (LWIR) has an energy spectrum in the
range 8mm to 14mm [2]. The selection of the infrared band depends on the type of
performance that is desired for the specific application that it is being used for [2].
1
Figure 1: shows a diagram of the electromagnetic spectrum of energy. The diagram shows the visible
spectrum surrounded by the infrared and the ultraviolet spectrum on either side. The picture has been
obtained from TEAMWORKnet Inc. official website. It was part of a paper presented by Harry Tittel, Vice
President TEAMWORKnet Inc.
In the figure above we can see a high interference zone between the two infrared energy
band spectrums [2]. It has been observed that MWIR is better suited for hotter objects or
in cases where sensitivity is of less importance in relation to contrast [2]. Also MWIR has
an advantage that it requires smaller optics [2]. Traditionally LWIR is preferred in cases
where we require high performance infrared imaging since it has a higher sensitivity to
ambient temperature of objects and also displays better transmission through smoke, mist
and fog [2]. MWIR and LWIR have major differences with regards to background flux,
temperature contrasts and atmospheric transmission.
2
Mid wavelength band infrared (MWIR) Long wavelength band infrared (LWIR)
• It has a higher resolution due to a
smaller optical diffraction [2].
• Higher contrast [2].
• Good only in clear weather
conditions [2].
• Transmission is possible in high
humidity conditions [2].
• It shows a good performance in
foggy, hazy and in misty conditions
[2].
• Its transmission is least affected by
atmospheric conditions [2].
• It reduces solar glint and fire glare
sensitivity [2].
Table 1: shows a brief description of the advantages and applications of MWIR and LWIR.
1.1.1. Motivation
In our fall 2003 ECE 573 project Infrared Imaging in Modular Multipurpose Multi –
sensor Robot and in our spring 2004 ECE 574 project Infrared Imaging Sensor Brick for
the MODSEN Robot we have been working towards building an infrared sensor brick
with Omega infrared camera on it for the purpose of data capture. Since this system could
be set up like a small sized self sufficient device which could perform search and
surveillance operations on its own so we decided to explore the possibility of setting up a
human tracking system on it. This was the primary motivating factor for this project. We
felt the need to conduct a complete literature review on all the possible applications of
infrared imaging with special focus on human tracking.
The system anticipated to be developed after integrating the two projects involving the
building of the infrared sensor brick and then setting up a human tracking system on it
promises to be of great use in search and surveillance operations. The inherent
advantages of such a system are that it is small in size and light in weight and it can
perform the highly sophisticated task of human tracking. Being able to capture infrared
data and track human beings is what makes it very special because in the dark where
3
human vision stops the infrared imagery would be used to detect possible ambushes, plots
and hidden enemies by making use of the night vision capabilities of infrared imagery by
sensing heat. In the past people have been successful in setting up robotic systems similar
to ours using a vision camera for the purpose of human tracking. Other possible further
applications on the brick could also be face recognition, pattern recognition. Since
currently the world is seeing an unprecedented increase in the level of concern for both
safety and security issues the demand for such human tracking systems should have a
bright future.
1.1.2. Mission
In this project of ours we are conducting a survey on all the possible applications of
infrared imaging with main focus set on the use of infrared imaging in human tracking
systems. A brief review of all the major applications of infrared imaging would be
provided. We shall also conduct an in depth survey on the use of infrared imaging in
human tracking systems. Applications of infrared imaging like motion and intrusion
detection; face and pattern recognition; surveillance and others would be touched upon
but we shall be very brief about their details. All these other applications of infrared
imaging could feature as possible future work for our integrated system so the task of
literature review is being completed here.
The survey would lay its main focus on general object and human tracking techniques
and systems using infrared imaging. The human tracking systems other than performing
the task of continuously monitoring the movements of a person would allow us to locate
people, follow them and perhaps even help in attracting the attention of concerned
authorities by sounding an alarm if people cross certain boundaries in a given field of
view. Humans would be basically tracked for by looking out for features like their face,
head, bust and silhouette. We could also look out for changes in temperature over certain
region of area over a certain period of time.
4
1.2. Infrared Imaging
As discussed in the introduction above an infrared image is one, which gives us a pattern
corresponding to the heat energy radiated by the area or the object that is being imaged.
Infrared images are usually used to see those things, which are not naturally visible to the
human eyes (i.e. things, which are invisible to normal vision). For instance a thermal
image can be used to procure night vision capabilities where by we can detect objects in
the dark which would remain invisible in a normal visual image. Night vision capabilities
have useful applications in search and security operations where we can sense and detect
hidden objects or people who might have set an ambush or a plot in the dark. These
objects would not be visible to the human eye in the absence of light.
A thermal image could also be used to detect visibly opaque items, (i.e.) objects, which
are usually hidden behind other objects or things. This task is accomplished by a thermal
image which gives us a pattern showing the heat energy emitted in an area or by an object
and incase there is any temperature difference in that area or in that object then it will
show up very clearly in the image. Thermal images could also be used for face
recognition and pattern recognition. In face recognition techniques they could be used as
identification markers, which would help in allowing entry and exit to people in secure
locations. Another major application of thermal imagery is in quality control systems
used in the manufacturing processes of many products ranging from food, glass, cast iron
patterns, moulds and others. Here quality assurance and surveillance on the production
line could be ensured using thermal imagery. Lets us now get a brief overview on the
field of infrared imaging and then take a look at the general applications of infrared
imaging.
5
1.2.1 Overview
Thermal imaging as it stands today was developed in the early 1800’s by Sir John
Herchel [2]. He was actually interested in photography and started recording energy in
the infrared spectrum by conducting a number of experiments using carbon and alcohol
[2]. But it has been believed that analysis of humans based on the difference in
temperatures in different regions has been long established. The ancient Greeks and
before them the Egyptians, knew a method of diagnosing diseases by locating the
excessively hot and cold regions in the body [2].
According to the paper in [2] infrared imaging finds applications in 3 major areas. They
are for Monitoring purposes, for Research and Development purposes and General
Industrial applications [2]. Firstly monitoring is used for security purposes like
monitoring a prison area to maintain a check on the prisoners at night in the dark, to
guard a secure location from intruders, in coastal surveillance to look out for enemy
vessels during the night in the dark [2]. Also monitoring could be used to look out for
possible illegal activities or wrong doings and can be used to protect people from the
threat of possible predators in the dark [2].
Secondly monitoring is used to control traffic may it be vehicular traffic or air traffic.
Vehicular traffic can be monitored at night using infrared imagery. Air traffic can be
monitored in the dark for takeoff and landing in the absence of light [2]. Thirdly
monitoring is used in disaster management and in rescue operations where infrared
images can help us in locating victims, spotting fire and in monitoring rescue and other
operations through smoke and fog [2]. Lastly monitoring also finds application in
obtaining disaster footages for television and media persons [2].
The applications of infrared imaging for research and development purposes are very
varied and include areas like remote sensing, ecology and medical treatment, thermal
sensing and thermal design [2]. In remote sensing infrared is used to conduct aerial
observation of ocean and land surfaces on earth, observation of volcanic activity,
6
prospecting for mining and other natural reserves [2] and other exploratory operations
which may even include military and civilian spying. The application of infrared in
ecology and medical treatment is in detecting diseases in plants and animal [2]. They can
also help in early detection of diseases like breast cancer and conducting a check up of
the eye [2]. They are also used in diagnosing heart diseases and in checking for the
functioning of blood streams and the circulatory system [2]. Infrared also finds
application in veterinary medicine for diagnosing animals like cats, dogs, horses, etc. [2].
In thermal analysis the applications are in design of heat engines [2], analyzing the
thermal insulating capacity or the heat conducting capacity of a body or a substance [2].
In thermal design the applications are in analyzing the heat emission levels and rates [2],
it can also help in evaluating transient thermal phenomenon in electronic components [2]
and on a bigger scale in evaluating power plants [2].
The general industrial applications of infrared are in facility monitoring, non destructive
inspections and in managing manufacturing processes [2]. In monitoring facilities like a
chemical plant or a nuclear reactor the infrared renders help in looking for abnormal heat
emission levels in boilers or distillers [2]. Detection and tracking of gas leakages and
faulty piping can also be done [2]. Power transmission line and transformers are also
monitored [2]. Infrared finds application in non-destructive testing in cases like
inspecting buildings, roofs, walls and floor surfaces [2]. In inspecting cold storage
facilities and in inspecting internal defects in walls and other optically opaque objects [2].
The major industrial application of non-destructive testing is in checking motors,
bearings and rings [2]. Managing manufacturing processes accurately is a difficult task
and they pose a strong challenge but the advantages of infrared can be aptly utilized like
in the case of maintaining the distribution of heat in a furnace, or in the case of metal
rolling processes [2]. Infrared is also used to manage smoke and thermal exhausts and to
control the temperature of metallic molding processes [2].
7
1.2.2 General Applications
In the sections above we have just seen a brief overview of the origin, uses and
applications of infrared imaging. In this section we shall lay slightly more emphasis on
all the known applications of infrared imaging and cover them all in brief before we
begin to focus on the main application of our interest (i.e.) Human Tracking.
The major application of infrared imaging is in giving us night vision capability (i.e.) to
see people and objects in the dark in the absence of light by creating a pattern in response
to the heat radiated by objects in that area. The thermal images help in sensing and
thereby detecting hidden objects or people, which are not naturally visible to the human
eye in the dark [8]. This finds major use in both military and civilian search and
surveillance applications. Face recognition and Pattern recognition are amongst the more
classical applications of infrared imaging [8]. This is because thermal images are both
pose and illumination invariant and so the prevalent lighting and other conditions that
affect visual images do not affect infrared images. Face recognition could be used for
hunting down wanted suspects and the application of pattern recognition could
encompass area surveillance and human detection to keep a check on unforeseen
activities in a secure area. Infrared imaging has many industrial applications like quality
control in the manufacturing process of many products ranging from food, glass, cast iron
patterns, moulds and other products where quality assurance and surveillance on the
production line is necessary [8].
In the case of a fire related disaster infrared imaging is used to detect leaks, fire, see
through smoke and search for victims, detect for the presence of other flammable
substances in that area and provide vision in these conditions for all the further rescue
operations [8]. Like in the case of military operations infrared imaging also finds some
applications in naval operations where it is used for detecting possible oil spillage and
also threat posed from enemy vessels at night in the dark [8]. In the air force they use it
for aerial surveillance and enemy detection [8]. Some of the commercial applications
8
include coverage of disaster footages through smoke and dark areas for use by the media
and television networks [8].
The growth of the infrared technology has been a big boon for the maintenance industry
since infrared videos help in seeing through visually opaque objects thereby helping in
providing vital details for the purpose of predictive maintenance. Over the years the
maintenance industry has changed in its approach. Traditionally maintenance was seen
necessary only in case of breakdowns, post World War II it became maintenance for the
purpose of preventing a breakdown and nowadays it is predictive maintenance [2].
Predictive maintenance takes place in thermal imaging surveys, in electromagnetic
testing, breaker relay testing, visual testing and leak testing [2]. Other areas include
magnetic particle inspection, ultrasonic inspection, in vibration analysis, in eddy current
analysis, also in transformer oil analysis and in X- ray and gamma ray radiography [2].
The ability of infrared images to help detect hot spots and temperature differences
without making any physical contact is what makes them special in the process of
predictive maintenance [2].
All the possible major applications of infrared imaging as given in the paper in [2] are
mentioned below. Infrared imaging is used to inspect heater tubes, in steam / air de-
coking, to confirm the readings of a thermocouple, in flame impingement, in refractory
breakdown, in condenser fins and to verify spheroid levels [2]. Infrared imaging is also
used to check air-leakage in a furnace, locate gas emission; they also find application in
the aerospace industry [2]. In electrical systems thermal images are used to inspect power
generators and power substations, to evaluate transformers and capacitors [2]. To inspect
both rural and urban overhead distribution electrical lines, to inspect electric motors and
check motor control centers, starters, breakers, fuses, cables and wires [2].
The automotive applications of thermal imaging are for detecting faulty fuel injection
nozzles, to test the brakes and other engine systems and to evaluate them for performance
and cooling efficiencies [2]. Lastly it also finds application in diagnostics for motor
9
racing suspension and tire contacts [2]. In electronic equipments it is used in the process
of evaluating and troubleshooting printed circuit boards, in thermal mapping of
semiconductor device services and in the evaluation procedure for circuit board
components [2]. It is also used to inspect hybrid microcircuits and solder joints, and in
the inspection of bonded structures [2].
Like in the case of other fields thermal imaging also has applications in mechanical
systems. Here it is used to inspect boilers and kilns, to check building diagnostics and
heat loss, and to inspect roofing systems [2]. They are also used to inspect burners for
flame impingement and burner management, to analyze the fuel combustion patterns, and
to detect thermal patterns on boiler tubes and measure the tube skin temperature during
the normal or standby operation [2]. It is used to scan and record temperatures in the
unmonitored areas of the boiler, to scan the exterior of the boiler for refectory damage or
to locate warmer areas [2]. Infrared images also help in detecting coke buildup in crude
furnaces, flue gas leaks in power plant boilers [2]. They are used to inspect mechanical
bearings, evaluate heating, ventilation and air conditioning equipment, evaluate cold
storage for cooling losses and lastly check refrigeration equipment for insulation leaks
[2].
In the field of medical science and veterinary medicine infrared imaging is used to look
for diseases like breast cancer and arthritis, to conduct medical examinations for
whiplash, back injuries and carpal tunnel syndrome [2]. It is also used in dentistry and to
evaluate sports injuries and to monitor the recovery processes [2]. In veterinary medicine
it is used to check for injuries, stress fractures and lameness [2]. Thermal imaging helps
us in detecting sensations like pain, numbness, etc; which would not have been seen by a
normal image because these are problems in the functions of the tissues and not with the
structures [5]. Infrared imaging helps us in detecting alterations of the body’s workings
like pain and inflammation, nerve irritation and dysfunction, Angiogenesis (new blood
vessel formation), circulatory incompetencies and treatment efficacies. All these cannot
be detected by regular imaging [5]. Infrared imaging is used for assessing pain and
10
inflammation. In this it helps in assessing musculo-skeletal and articular pain, in
assessing the efficacy of chiropractic, osteopathic, physiotherapy, acupuncture and
myotherpy care [5]. It is used in assessing post injury trauma, in post surgery assessment
and in confirming the diagnosis of certain diseases [5].
The next application is in assessing nerve dysfunction and irritation. In this it helps us in
examining and correlating between the musculo skeletal finding and neurological
irritations or impairment and also in looking at suspected nerve entrapments [5]. It is used
to assess for reflex sympathetic dystrophy, complex regional pain disorder,
sympathectomies and nerve blocks [5]. In the case of investigating for angiogenesis
infrared imagery helps in breast imaging along with anatomical screening, also helps in
post skin cancer investigations and in assessing the level of acceptance or rejection after a
skin graft [5]. Lastly in the case of circulatory insufficiencies infrared helps in mapping
the varicose vein [5].
An infrared imaging system was reportedly used by many countries to detect passengers
entering the country with a high body temperature to guard against the entry of the SARS
virus [1]. Infrared imaging is proving to be a very useful tool in the hands of the law
enforcement professionals by helping them in stopping crime before it happens [LETA].
Since a thermal imager is capable of measuring very small temperature differences it
allows us to see in almost zero lighting conditions [LETA] by taking these images and
creating an infrared picture. Law enforcement authorities to catch criminals can use these
pictures. This ability of the thermal imager to help in preventing crime is recognized by
an association called the Law Enforcement Thermographers Association (LETA) and
they have 11 accepted applications of thermal imaging, which can help in crime
prevention [LETA]. These applications are accepted after any judgment passed by the
state or federal court accepts infrared images as evidence in a case [LETA].
Currently there are 11 applications of infrared imaging accepted by the LETA. They are
discussed in brief here.
11
• Hidden compartments: thermal imaging can help in detecting hidden
compartments in vehicles, which may be used for transporting illegal drugs,
contraband or even people. Since a thermal imager can detect any change in the
thermal characteristics of a surface caused by an adjoining wall or bulkhead it
will highlight these structural details invisible to the naked eye [LETA].
• Perimeter Surveillance: an infrared imager can help in the day and night
monitoring of highly restricted facilities and thereby help in spotting and
apprehending suspects who may be invading that secure area [LETA].
• Marine and Ground Surveillance: the night vision capabilities of a thermal
imager help in tracking during the night for both navigational as well as
surveillance purposes [LETA].
• Structure Profiles: the structure profile of a building obtained by using an
infrared imager will show the heat radiated by the building, and any unexpected
excessive radiation can help in checking for unwanted activities [LETA].
• Officer Safety: during night patrolling the officers can use infrared imagers to
look out for hidden suspects, guard dogs and other dangerous obstacles, which
are not visible to the human eye in the dark. Also they can do all this without
being exposed in the open. They can also use this to see through smoke and dust
[LETA].
• Disturbed surface scenarios: a surface may it be the earth surface or any other
artificial surface even though it is not apparently visibly disturbed will radiate
heat differently and so an infrared imager will help us in looking for hidden
compartments or floors [LETA].
• Environmental: air, water or soil pollutants radiate heat differently then their
surroundings and this difference in heat can be easily detected using an infrared
imager and the pollutants can be tracked back to their source [LETA].
• Flight safety: infrared imagers give night time vision to aircrafts thereby helping
them in detecting power lines and unlit landing sights and other such obstacles in
their pathway [LETA].
12
• Fugitive searches and rescue missions: living objects such as human beings
and animals are excellent radiators of heat hence infrared imaging can be used in
search and rescue operations to look for people who may be invisible to normal
vision due to optical shielding. The radiated heat is easily detected and the
suspect can be spotted from his hideout [LETA].
• Vehicle pursuits: vehicles radiate a lot of heat both while in use and even after
sometime. This heat shows up not only from the engine but also from the tires,
brakes and the exhaust. Using a thermal imager the police can spot a vehicle,
which may even, be driving with its headlights turned off to avoid being spotted.
Also a suspects car, which has just entered a parking lot, can be detected from it
heat emission [LETA].
13
1.3. Infrared Imaging based Tracking
While conducting a survey on the applications of infrared imaging and on human
tracking using infrared imagery we collected some useful papers on automatic target
detection and tracking using infrared imagery, face region tracking using infrared and on
3D tracking based on infrared cameras. Since these papers were closely related to the
main area of focus in our review we decided to document a review of these papers also in
the section that follows. We have divided this section into two parts. The first is on object
tracking which deals with automatic target detection and tracking and 3D tracking using
infrared imagery. The second is on face region tracking.
1.3.1 Object Tracking
In this section we shall deal with automatic target detection and tracking based on the
paper presented in [Braga-Neto 1999].
In [Braga-Neto 1999] they have proposed a method for automatic target detection and
tracking in forward-looking infrared FLIR image sequences. They have employed
morphological connected operators to extract and track targets of use and to eliminate
unwanted clutter. These operators are designed based on the criteria of general size,
connectivity and motion using the spatial intra-frame and temporal inter-frame
information. Connected operators are filters that do not modify individual pixel values
but instead act at the level of the flat zones in an image. A flat zone is a maximally
connected region of the domain of definition of an image with a constant gray level
value. Automatic target detection and tracking (ATDT) in forward-looking infrared
(FLIR) image sequences is very important for military applications. Firstly an image
sequence is filtered on a frame-by-frame basis and any background and residual clutter is
eliminated. Since now the presence of target if any is enhanced, a motion-based analysis
is conducted on the detected targets. This is accomplished exploiting the spatiotemporal
correlation of the data given in terms of a connectivity criterion along the time
dimension. From their experimental results they claim that their method is effective and
robust to a wide variety of targets and clutter variability. Even though the above
14
description might look really simple ATDT is difficult to accomplish due to the high
variability of targets and background clutter and the low spatial resolution of FLIR
images.
The main source of clutter in such images is sensor noise, natural background texture,
and human artifacts like buildings or other useless objects in the scene. The background
due to its inhomogeneity may contribute to high contrast edges, which may contribute to
a higher rate of false alarms. The technique presented by them gives an effective and
robust method for clutter suppression and normalization and is consistent over a wide
range of illumination conditions. The system has been designed with main focus on
reducing the rate of false alarms. The two-step FLIR ATDT algorithm presented in this
paper is as shown in the figure below.
Figure 2: The figure shows the block diagram of the FLIR ATDT algorithm implemented by Braga-Neto
et al. The picture has been obtained from [Braga-Neto 1999]
The algorithm is based on three basic and purely geometrical assumptions of the targets.
The assumptions are regarding size (i.e.) the targets of interest have a maximum specified
15
apparent size, relative position (i.e.) the targets of interest be situated away from the
boundary of the field of view and motion (i.e.) the targets of interest have limited relative
motion with respect to the FLIR sensor.
This algorithm uses a two-step procedure to process the video sequences. In the first step
(Intra-frame processing), they independently process the individual frames of an input
sequence. This is done to detect targets in individual frames based on contrast. This step
detects peaks (Hot targets) and dips (Cold targets) to obtain all the possible candidate
profiles and process them in order to land to the useful targets. Intra-frame processing
consists of background removal that is a process that uses reconstruction top-hat
operators to reduce background clutter and enhance the presence of targets. The second
process in intra-frame processing is of adaptive double thresholding. Adaptive double
thresholding is based on morphological reconstruction and is very robust. The process
finds the associated parameters adaptively for each frame. This process gives superb
ATDT performance over a wide range of image sequences. Adaptive double thresholding
has been used instead of simple thresholding because in simple thresholding large
difference in slicing values causes the image to be contaminated by clutter and small
differences result in an image with useful targets but they may be split into disconnected
regions. Since simple thresholding is very sensitive to chosen slicing values a slight shift
in these values may even completely eliminate these targets completely.
After having combined the results of binary detection of the previous step into one
sequence most of the false alarms and clutters that survive are removed in the second step
(Inter-frame processing). False alarms are eliminated by exploiting the spatiotemporal
correlations in the data, which are given in terms of a dilation based connectivity criterion
along the time direction. Inter-frame processing consists of 3-D labeling with dilation-
based connectivity; in this process they label the binary detections of the intra-frame step
so that the detections associated with the same target carry the same label. The second
process in inter-frame processing is of component filtering. This is done to eliminate
grains that are not consistently detected in the sequence. These grains are taken to be
16
missed targets, false alarms or targets that are moving too fast to satisfy the motion
criterion. They make an assumption that a valid target should be detected in at least m
consecutive frames. Once the sequence has been labeled by the previous step the grains
with similar labels that do not appear in m consecutive frames are discarded.
1.3.1 Face Region Tracking
Another rather popular use of infrared imagery is in face detection and in tracking the
facial regions of a human being as these find applications in face recognition systems,
human-computer interaction and in video surveillance systems. Many systems for face
region tracking have been implemented using visual imagery but as mentioned before all
of these suffer due to illumination variations in the scene being imaged and changes in
skin color of the person. Since an infrared image is representative of emitted light and not
reflected light (as in a visual image) it does not suffer from these pitfalls and so allows
them to provide useful images in almost any lighting conditions [Eveland 2001].
In [Eveland 2001] a three part human face region-tracking system using a thermal IR
sensor has been presented. Firstly a method for modeling thermal emission from human
skin, which can be used for the purpose of segmentation and detection of human faces in
infrared imagery, is shown. Then human heads are tracked over a period of time by
applying segmentation models to a condensation algorithm. Lastly they have evaluated
the use of tracking results to improve the segmentation procedure. In the first part (i.e.)
modeling the skin in thermal IR they classified pixels in an indoor scene as belonging to
three classes namely exposed skin, covered skin (either by clothing or by hair) and the
rest as background.
They chose infrared imagery because mid-to-long wavelength IR is emitted rather than
reflected which makes it an illumination invariant model. Also there is uniformity in the
emissivity values of skin for different members of the population. This means that such a
set up can perform equally well skins of all colors.
17
For the purpose of segmentation they have classified the image pixels in an indoor scene
as belonging to one of the three classes and create a probabilistic model from them. They
are exposed skin, covered skin (covered by hair or clothing) and the rest as background.
Figure 3: The figure shows left to right the probabilities of skin, covered skin and background in the face
region tracking system implemented by Eveland et al. The picture has been obtained from [Eveland 2001]
Once the segmentation is performed then they use it to track faces in the scene. They
model faces as arbitrarily oriented ellipses, with variable sizes and positions. The major
task in tracking faces for them is in selecting an element of the state space of all such
ellipses for each frame of the video at time t, (i.e.) they have to estimate a probability
density on the state space, encoding the likelihood that the tracked object is in a given
position. On this density a number of estimators can be applied to recover the single state
which will correspond to the object’s parameters. Using the MAP estimator they have
selected the state with the highest likelihood.
They found that calibrated images allowed them to use training data for tracking. They
felt that it was the process of calibration of infrared imagery which made those images
better suited for illumination invariant robust tracking and thereby giving infrared
imagery the advantage over visual.
18
1.4. Human Tracking
With an unprecedented increase in the concern for security issues Human tracking is
turning out to be a major area of research. So what is so great about tracking human
beings? Tracking humans can easily be considered to be amongst the most complex tasks
in the field of image processing and computer vision. Since this involves recognition of
different bodily shapes and colors hence there is no one single model that can help us
define all the features that we are looking for [7]. In simple terms no one model can
represent all the humans and so this is what makes the task all the more difficult.
Traditionally Human tracking has been done using visual sensors and a lot of work has
been done in this area but the problems that are usually encountered are that visual
sensors require proper lighting and suitable operating conditions. Since these sensors
basically capture the color, the shape and the texture details in a scene for you these
sensors are not illumination and pose invariant [7]. Since vision sensors give variable
response with variations in skin color, thermal sensors are now being seriously
considered for the purpose of Human tracking. Thermal sensors have some obvious
advantages to offer over the Vision sensors at least in this regard. The advantages are that
the characteristics of a thermal image for humans are uniform for nearly the entire
population [7] and without any doubt thermal sensors outperform vision sensors in
conditions of poor lighting and visibility [7].
1.4.1 Background on Human tracking
As might have been perceived by now from the above paragraph, is it then very easy to
accurately track humans in the dark especially during the night using an infrared sensor.
The answer is well yes it is possible to easily track a human in the dark since the heat
sensed from the humans and the background will be different and the difference will
distinctly show up [7]. But the task is not as straight forward because we will have to
19
device an algorithm that takes into account other sources of heat like objects other than
human beings [7]. Another issue that needs to be addressed and taken care of is the
problem of occlusion [7]. So how do we then measure the performance of a human
tracking system? The main goal of a human tracking system is to accurately detect and
track the presence of human beings in a given field of view in an extremely cost-efficient
manner. The overall setup laid down for implementing the system should be economical.
Complicated computational algorithms, high cost of equipment and a high requirement
on bandwidth are the major factors contributing to the hardships in implementing a cost
(computational cost, power requirements and cost of equipment) effective human
tracking system.
In the survey conducted on Human tracking techniques we have seen that the task can be
accomplished either by employing a single sensor or by employing an array of sensors. It
is observed that a detection system designed based on an array of sensors has certain
inherent advantages with regards to computational efficiency and accuracy of tracking.
An infrared sensor array based tracking system generally consists of many low power
requirements, low cost, low-resolution cameras (COTS) and since these sensors are so
closely spaced hence the bandwidth requirements are also low. Hence such a design
accounts for a more efficient system. Also a single sensor based tracking system has a
limited field of view and this makes the placement of the sensor a very critical issue. A
sensor array based tracking system is better equipped to track objects than a single sensor
system since they can localize the motion of the object. Such a system basically locates
any object motion detected by employing a large number of networked sensors and then
by using a technique like triangulation tries to locate the object more accurately then can
be possibly done using a single sensor system.
The major techniques that are usually employed in the process of tracking humans are
motion detection, background subtraction and template matching. All the above
techniques typically help to cover a broader field of view and get sufficient footage to
allow the system to decide if the object being tracked is a human being or something else.
20
One of the important tasks that need to be accomplished for tracking humans is to detect
human motion. Human motion is usually simply detected using background subtraction
algorithms. Here we firstly capture a sequence of frames in one particular field of view
with the camera fixed in one position and then compare these frames with a pre-modeled
background image of that same field of view. Such a technique works well if the
background does not change. However, if the background is also changing, then such a
method is not feasible. In such a case human motion detection cannot be accomplished
using elementary background subtraction techniques.
The field of view (Region of interest) in which motion is detected must be checked for to
confirm that the object that has caused motion is a human being only. This can be
accomplished using the process of template matching. The object is compared with a
template bearing some features that characterize humans. These features could be shape,
texture or temperature of the body surface. Such features can help in classifying with a
good degree of accuracy that the object that caused motion was a human. Having decided
that the object is a human based on the features of the template next we need to
continuously track that human.
Human tracking finds applications in search and surveillance operations where a sensor
network may be located strategically in a secure area to track humans. Such a network
would ideally detect the presence and movement of humans in that area which is meant to
be guarded and track any person entering the field of view and then raise an alarm on
intrusion. More advanced and sophisticated versions of similar systems can be employed
in crowded places (i.e.) in areas with high population since here we shall encounter the
problem of changing backgrounds and so such a system will need a complex algorithm.
Human tracking also finds application in big shopping centers and big buildings where
we can track people along lanes, aisles and hallways. People exhibiting unusual and
particularly suspicious behavior can be looked for and their activity can be tracked.
21
1.4.2 Survey on Human Tracking techniques
In this section that follows below we shall present a review on the survey that we
conducted on Human Tracking techniques employed by people until now. Human
tracking is of great importance in both search and surveillance operations as it helps in
guarding secure locations by keeping an eye on the activities conducted by the people in
a certain field of view [7]. Since nowadays Human tracking systems are being heavily
employed in areas with high pedestrian population like at the airports and at the railway
stations which are areas with increased security risks we cannot afford to have a human
being to observe the full video footage and look for defects in the image sequences [7].
This is because such a task requires concentration over long periods of time and humans
are not well suited to perform such a task efficiently. Hence a computer automated
process for Human tracking is necessary.
A heterogeneous network of infrared motion detectors and an infrared camera for the
detection, localization, tracking, and identification of human targets is implemented in
[Feller 2002]. The network employs a large number of low cost motion sensors for target
tracking along with a small number of image sensors for image registration. Networks
like these presumably find applications in local and distributed perimeter and site
security. Such networks can be designed to have a serial, a parallel or a tree topology
along with an ad hoc organizational structure [Feller 2002]. There are also some
statistical models that have been devised for ascertaining optimal network configuration.
To make such systems more and more robust we need to use highly specialized
components, which are expensive both to deploy and to maintain. For setting a large
network cost seems to be a prohibitive factor thereby requiring cheaper alternatives. Thus
the paper in [Feller 2002] tries to give us an insight into setting up a sensor network using
current off-the shelf (COTS) components. It tries to device a network, which combines a
large number of low cost motion sensors with a small number of high resolution imaging
components. While developing such a network the primary considerations were on low
power consumption to make the system self sufficient and long lasting, on use of COTS
technologies to reduce the cost, on wireless communication to avoid all the mess of
22
wiring and to increase reliability and on the use of infrared sensing to allow it to work in
a variety of lighting conditions. Since the sensors cost a lot less compared to the cost of
communication and the cost of computational components hence by using a dense
distribution low bandwidth sensors they tried to reduce the computational requirements
and thereby the cost. Also a lesser number of high-resolution cameras were required since
the low-bandwidth sensors were allowed to characterize the entire environment.
The sensor network was based on the concept of using a large number of relatively low
cost sensors to analyze the environment than to use a single expensive sensor, which
gives lesser details. For the purpose of detection, localization and tracking of human
targets 22 motion detectors spread across the imaging space with known location and
orientation were used. The diagram in figure below shows a logical representation of the
sensor network.
Figure 4: shows a logical diagram of the sensor network. The picture has been obtained from [Feller 2002]
The central node received all the sensor data, which it fused to extrapolate the location of
the events in the environment. Based on the location information computed on the control
node the high-resolution camera gets focused in the direction of the location to perform
identification. The location data thus obtained from the control node and the IR camera
are then transmitted to the host computer. The sensor network design supports 256
uniquely identified motion sensors, which once placed in the field, remain in standby
mode waiting to detect motion. The sensors used were made up of two adjacently placed
23
pyroelectric diodes. It functioned in such a way that if the first diode was activated and
then within a certain period of time if the second diode happened to trip then the sensor
would report a motion. The sensors had a certain waiting time before it reported a new
detection and this was done to avoid repetition. This time could be adjusted for each
sensor based on its location in the network. The sensor which detected the motion would
then send an identification signal to the central command to help it in locating its position
and this central node would then use the asynchronously received data from the triggered
sensors along with the previously determined sensor orientation and location to
extrapolate the location of the target. The infrared camera and its control systems are only
used as passive observers in this system and they continuously focus on the detected
target based on the space coordinate information provided by the control node.
After the motion had been detected they had to decide the location of the source. The
probability of detecting motion increased with increase in the density of sensors in the
network. The regions with the highest probability of containing a target could be
ascertained using a two-dimensional back propagation algorithm based on the prior
knowledge regarding the location of the sensor, its orientation and its field of view. When
a motion would be detected then the pixels corresponding to the field of view of the
sensor would be incremented. If more than one sensor detected the same motion then the
intensity of the pixels in the same field of view would be high as compared to detection
by a single sensor. The highest intensity pixel values were the areas with the greatest
probability of locating the targets. These coordinates of the highest intensity points help
in determining the camera angle. The back-propagation algorithm was employed for its
flexibility and extensibility. That is, it can easily accommodate the addition or removal of
sensors to the network and it only requires the orientation and location parameters of the
new or shifted sensors to be modified in the current system. Since each sensor had a fixed
field of view hence they could setup a large number of diverse sensors without much
change to the system.
24
Also using the back-propagation algorithm they could weight the importance of the
sensors and create a network in which the focus was more on certain areas than the
others. As the target moved through the region of coverage it was required for the pixel
map to reflect these changes by increasing the importance of that space at the cost of the
previous space. This was achieved by constantly reducing the intensity of the pixels in
the map over change in time. Hence when a person moved from one region of the map to
another, the intensity in the area in which motion was detected increased and hence at the
same time since that person left the old region the algorithm would fade the intensity of
the old region and increase the intensity in the new region. This was achieved by
updating the map using inputs from the sensors at predefined time intervals.
Figure 5: shows the layout of the sensors in the area of surveillance. The picture has been obtained from
[Feller 2002]
25
Figure 6: The figure on the left shows a graphical interface showing a target located at top left of the space,
the figure on the right shows an infrared image of the target in the imaging space. The pictures have been
obtained from [Feller 2002]
The targets were immediately detected on entering the field of view of the sensor
network, and the infrared camera almost instantly focused in that direction. The camera
could only detect the exact location of a target once more than one sensor had detected
motion. The network was capable of handling changes in motion anywhere in the
network environment. When there was target detection by multiple sensors the camera
was able to focus most of the target in its field of view, even though the image was not
centered. The network can sense motion by any target but it is not capable of detecting
the number of persons that cause this motion.
In [Nakamura 2001] a comparison of the pros of cons of ordinary video method and
infrared video method of tracking passenger movement is presented and a comparison
between the two is done. They feel that while using video method for tracking humans
incase two or more passengers happen to cross each other then their images would
overlap and this would present obstacles in separating the individual trajectories of each
passenger, these they believe are inherent pitfalls of the video method due to the
limitations in position and viewing angle of the camera. Hence they feel that using an
infrared video camera they could track passengers more effectively and also overcome
the problem of crossing since in infrared they could detect hot surfaces such as the human
face. An experiment was setup in which both video data and infrared video data were
collected in a hall where an event was being held. Passenger movement was tracked for
close to three hours.
This data was used for background abstraction. Another experiment was conducted to
create different situations (passengers’ crossing each other, affects of shadows, different
lighting conditions and different climatic conditions) that could be encountered to
examine the possibility of infrared video. Firstly the background images were obtained
26
and then these were subtracted from the original data so that they could only track the
moving data. A movie clip of data for both a video sequence and an infrared video
sequence were collected. From these video sequences still frames of resolution 720X480
were extracted, they were in BMP format. The frame rate was 30 fps hence every second
of video generated 30 frames.
Figure 7: The figure on the left shows each pixel of the extracted BMP frame. The figure on the right
shows the histogram of the frame. The peak represents the background in the image. The pictures have
been obtained from [Nakamura 2001]
Next pixel data in every pixel was counted and a histogram was plotted. The value of
each pixel is the average of the values of the contribution made by each color channel R,
G and B. The peak value in this histogram was taken as the value corresponding to the
background and by subtracting this value they tried to obtain the moving data. Next the
obtained moving objects were labeled. They found that in ordinary video method the area
near the frame, where the passengers did not emerge was clearly abstracted. Since the
video was of short duration people who did not move were taken as background. Some
white noises were observed since the man who was focused on stopped for a while in the
center of the image. The video and infrared images obtained after subtracting the
background image are as shown in figures below.
27
Figure 8: The figure shows images obtained after subtracting background image (ordinary video). The
pictures have been obtained from [Nakamura 2001]
In ordinary video it was seen that the important factor was the shading contrast of both
the background and the moving objects. In cases when the contrast is not good
binarization is not easily possible. This is not a big problem in infrared video since
shading is a function of temperature, and hence setting the threshold and binarization is
easy by selecting the value of the skin (face region). The only problem that could be
encountered in this method is it is difficult to abstract objects when there are similar heat
sources in the background.
Figure 9: The figure shows images obtained after subtracting background image (infrared video). The
pictures have been obtained from [Nakamura 2001]
In [Fang] they have developed a New Night Visionary Pedestrian Detection and Display
System. This is an infrared video based human detection system for tracking pedestrians
at the night (in the dark) on the road. They have implemented a two-step static pedestrian
segmentation algorithm. Firstly the regions of interest are segmented from the rest in the
infrared images. This task is achieved easily in infrared images since the humans are
heat-radiating bodies and hence they exhibit higher intensity values on the image and so
28
segmentation is performed around the hot spots in the image. There are some errors that
can be encountered in this process. They arise due to similar or more heat emission by
objects like cars, light poles and human heads. The next step is to eliminate these
segmentation errors using similarity feature comparison using a template for identifying a
pedestrian.
Figure 10: The figure shows the results of the first step of image segmentation obtained by Fang et al. The
first two images were the results of initial segmentation and the next three images were the results of
tracking. The pictures have been obtained from [Fang]
Figure 11: The figure shows the results of the second step of segmentation obtained by Fang et al. The first
image is the original infrared image, the second one is the edge map of the original image and the last
image is obtained after applying some morphological operators on the second. The pictures have been
obtained from [Fang]
In [Xu 2002] a method for pedestrian detection and tracking using a night vision video
camera installed on a vehicle has been given. To handle the complex shape of the human
body two step detection and tracking approach has been discussed. Detection is achieved
using a Support Vector Machine, which employs size normalized pedestrian candidates.
Tracking is accomplished using Kalman filter prediction method and the process of mean
shift tracking. The road detection module helps in the detection phase by providing useful
29
information for the identification of pedestrians. The human body parts appear as hot
spots in infrared video and in this paper using SVM humans are detected in infrared
images, then using an estimated possible pedestrian size such image regions are looked
for to classify as pedestrians and non-pedestrians. After this the tracking of the
pedestrian’s heads or bodies is accomplished applying Kalman filtering prediction
algorithm and mean shift algorithm.
As mentioned above the algorithm employed in [Xu 2002] has two stages. The detection
stage in which candidate selection and pedestrian verification is done using SVM. In the
tracking stage a Kalman filter is used to predict the approximate position of the
pedestrians and then the mean shift method is used to determine the exact location of the
pedestrians. The hotspots in the infrared videos are detected using a dynamic threshold of
each frame. The threshold selected by them is:
Threshold = 0.2Mean intensity + 0.8White intensity
This threshold has been applied on histogram-equalized images for segmentation of
hotspots; the noises encountered in the process of segmentation are suppressed by
performing morphological operations. Segmented hot spots are labeled and then
identification is performed using certain criterions based on sizes and probable areas of
pedestrian location. Candidates were selected based on one of the two methods: hotspot
candidate (size estimation using the size of hotspots) or body-ground candidate (size
estimation using distance between the ground and the top of the hotspot).
The classification method used by them based on Support Vector Machine (SVM)
estimates the decision boundary between two sets of high dimensional vectors and then
employs these as support vectors to classify data from a similar source. They estimated
the effectiveness of a gray scale pedestrian candidate in comparison to a binary
pedestrian candidate and felt that for minor differences in training data and testing data
gray scale candidate detection worked well while binary candidates were highly shape
sensitive and so the detection rate was low.
30
Figure 12: The figure shows the results of comparison between gray scale data on the left and binary data
on the right obtained by Xu et al. The pictures have been obtained from [Xu 2002]
The results of the performance comparison between hotspot candidates and body-ground
candidates were that both had a similar detection ratio but hotspot was a faster, efficient
and a robust technique. The next experiment that they conducted was to classify the
training set into three types of pedestrians: along-street pedestrian, across-street
pedestrian and bicyclist. Then testing was done using two techniques, in the first method
a single classifier was applied to all the pedestrians. This was found to be a slow and a
lengthy process. The second method incorporated multiple classifiers each for a specific
type of candidate. This method gave more positive results and reduced training time and
the size of the support vectors but due to too many classifiers the system went slow.
Figure 13: The figure shows the results of comparison between positive samples for hotspot candidates on
the left and the positive samples for body-ground candidates on the right obtained by Xu et al. The pictures
have been obtained from [Xu 2002]
31
Figure 14: The figure shows the three types of pedestrian classes considered Along-street, Across-street
and Bicycle. The pictures have been obtained from [Xu 2002]
After having detected humans in the infrared videos they had to track them. They used
the human head for this purpose since it was a hotspot and its shape did not change
drastically between frames. Two methods were employed to do this. The Kalman filter
method was employed using the below mentioned equations to update the time related
parameters for each frame.
The time update equations used were: -
Priori positions: S 1−− Φ= kk S
Priori Measurements: QPP Tkk += ΦΦ −
−1
The measurement update equations used were: -
Kalman gain: 1)( −−− += kTkkkkkk RPP HHHK
Posteriori positions: )( SHZKSS kkkkkk
−− −+=
Posteriori measurements: −−= kkkk PP HKI )(
Here and are the estimated positions at time k-1, k and at time k before
updating with the error between and respectively. Similarly and are
error covariance for the current parameters for time k, k-1 and estimated parameters at k
respectively. is the transform matrix from and . Q is the model error and
1, −kk SS −kS
kS kZ 1, −kk PP −kP
Φ 1−kS −kS
Z k is the measurement at time k. H k is the noiseless connection between the
measurement Z k and the position at time k. LastlykS K k is the Kalman gain or the
32
blending factor that minimizes . kP
The head position in a new frame could have been estimated using the information from
previous frames. But since the pedestrian movement was not linear they employed mean
shift method to find the accurate position around the posteriori position. The equation
implemented was as given below.
∑∑
−
−=
Ss
Ss
swxsK
sswxsKxm
ε
ε
)()(
)()()(
Here x is the current position, w(s) (ratio of original gray scale level to current gray scale
level at s) is a weight function and m (x) is the new position. K is the Kernel given by
2
41)(
x
exK−
=π
The method implemented by them in [Xu 2002] could track multiple pedestrian bodies
simultaneously in real time. It was found that detection was a time consuming process
compared to tracking. Also tracking was much more robust since almost no detected
target was lost however there could be losses at the detection stage. The shortcoming of
tracking was that it only tracked detected targets and did not look for new persons. Hence
they chose to set detection after every 5 frames or after the tracked target are lost. Thus
the system claims to have incorporated the robustness of tracking and the ability of
detecting new individuals using interleaved detection stage.
Figure 15: The figure shows the results obtained by Xu et al. detection stage on the left where a circle
denotes a hotspot of the face and a square in the face region in the right figure shows the results of tracking
the heads that are detected. The pictures have been obtained from [Xu 2002]
33
It was found that a single classifier performed better as compared to multiple classifiers.
The detection rate was not high but the even distribution in time of detected frames meant
that almost at all times the pedestrian would be detected within a short time span. This
allowed them to interleave the detection process with the tracking procedure. They laid a
lot of stress on the time it took to detect a human as compared to the number of frames in
which detection was accomplished because all the detected humans were successfully
tracked.
In [Nanda 2002] a real time pedestrian detection system employing probabilistic
templates to detect the different shapes of the human body and one that works on low
level infrared videos has been presented. The infrared videos help to segment the region
of interest and then the template is used in identifying pedestrians. They have used the
raw data (i.e.) intensity values of each pixel in the preprocessed image to classify the
region as a pedestrian or a non-pedestrian. This method is used because they believe that
since the system uses low level infrared video which gives images corresponding to the
amount of heat radiated by the body parts and this amount of heat radiated varies
depending on the part of the body, clothes worn, pose and also the state of mind. Hence
the intensity variations will be large over a full body region and so neighboring pixels
will not be connected. This is why they did not choose a region based approach. They
also felt that presence of noise, low contrast and ghosting effects in the image would
almost rule out the use of edge-based representation.
Using the raw pixel data, targets were extracted by employing elementary thresholding
procedure. Then using a training data consisting of 1000 rectangular boxes containing
pedestrians they firstly calculated the mean and the standard deviation for both pixels in
the pedestrian region ( )11 σµ and and pixels in the background region ( )22 σµ and . Then
employing Bayesian classification technique in which the apriori probabilities for the
pedestrian region and background region are assumed to be equal and a Gaussian
distribution the threshold is set. The equation used is as shown:
34
21
1221
2
1
21
21 )ln(σσ
µσµσσσ
σσσσ
++
++
=Threshold
;1),( =yxth
;0),( =yxth
The thresholding technique that is employed is as given below
if image(x, y) > threshold
if image(x, y) <= threshold
The resultant image was a binary image in which only the pedestrian was seen and the
background was eliminated.
Next a probabilistic template was developed. They used a training dataset consisting of
1000 (128 X 48) rectangular images all of which had humans of same height but with
different poses and orientation. Thresholding was performed on the template so that the
model did not learn intensity variations in both the background and the foreground pixels.
Then each template was shifted so that the centroid of the non-zero pixels exactly
matched the geometrical center of the image. Then for each pixel of the template the
probability of it being pedestrian was calculated based on the frequency with which it
appeared as intensity value 1 in the training data.
During the process of pedestrian detection what they did was using a probabilistic
template and a test window of size 128 X 48 they estimated the probability that the
window had a pedestrian. They argued that with prior information that the window
contained the pedestrian the probability of correct classification for each pixel of intensity
value 1 was p(x, y) and it was 1- p(x, y) for pixels with intensity value 0. Using this logic
they calculated the probabilities for all the pixels and obtained the combined probability
that a given window with given prior would contain a person by summing all the
individual probabilities. They assumed that the intensity value at a point was independent
of its neighbors.
35
Figure 16: The figure shows the probabilistic template developed and used by Nanda et al. for detecting
pedestrians. The picture has been obtained from [Nanda 2002]
The equation that was used to calculate the combined probability is as shown below.
∑== −−+=
128::148::1 ))),(1(*)),(1(),(*),((),(
yx yxpyxthyxpyxthjiobabilitycombinedpr
Here was a 128 X 48 window around a pixel (i, j). After calculating the combined
probability a probability map was obtained. The mean and the standard deviation of the
combined probability were calculated for all the 1000 training samples as well as for the
1000 (128 X 48) windows that do not contain pedestrians. This was followed by
thresholding the probability map.
th
The system was implemented using 3 different sizes of probabilistic templates each of
which were created using 1000 different pedestrian templates. It was found that the
template worked fine even on people who were 25% of scale. The implementation was
found to be robust to noise and occlusions.
36
Figure 17: The figure shows the results obtained by Nanda et al. the pictures on the right are the input
frames and the pictures on the right are the respective output frames. The blue contours in output frames
indicate human heads. The pictures have been obtained from [Nanda 2002]
They were able to track multiple targets at the same time (simultaneously) in real time.
In [Haritaoglu-I 1998] a real time visual system for detecting and tracking people and
monitoring their activity in the open has been presented. The system is called “W4: Who?
When? Where? What?” and works on monocular gray scale video images or on infrared
images. It uses a combination of shape analysis and tracking to locate people and their
body parts like head, hands, feet or torso. It also creates a model of the appearance of
people so that tracking can be achieved through interactions like occlusions. The system
is also capable of tracking multiple targets at a time even with occlusion. The system
constructs a dynamic model of people’s movements to answer the questions what, where
and when and it constructs an appearance model of people to answer the question as to
37
who is being tracked. W4 tries to overcome the inherent errors (like instability in
segmentation process over time, object splitting due to overlapping of similarly colored
background regions), which are encountered in dynamic image analysis.
The system detects foreground region in each frame by combining background analysis
with simple low level processing of the resulting binary image. The background is
modeled statistically by using the minimum and maximum intensity values and the
maximal temporal derivative for each pixel, which are recorded over some period of
time. These values are estimated over several seconds of video and are then updated over
a fixed period of time after the system has ascertained that there is nobody in the
foreground. Using the background model each pixel is classified as belonging either to
the background region or to the foreground region.
Figure 18: The figure shows the process for motion estimation of body using Silhouette edge matching
between two successive frames employed by Haritaoglu et al. The pictures from left to right show the input
image, the detected foreground region, alignment in silhouette edges based on difference in median and
final alignment after silhouette correlation. The pictures have been obtained from [Haritaoglu - I 1998]
Any pixel x from the image I belongs to the foreground region if and only if:
)()()( xDxIxM >− or )()()( xDxIxN >− ;
Where M is minimum, N is maximum and D is the largest inter frame absolute difference
image that represents the background scene model. For segmenting the objects in the
foreground from the background in each frame they first threshold the image, then
perform noise cleaning on it by applying one iteration of erosion to foreground pixels,
followed by morphological operations like erosion and dilation and finally perform object
detection. According to them since striking a satisfactory combination of erosion and
dilation for outdoor images is a difficult task they apply morphological operators to
foreground pixels only after the process of noise elimination. In effect the system
38
reapplies background subtraction followed by a single iteration of dilation and erosion
only to those areas identified as foregrounds. Lastly a binary connected component
analysis is applied to the foreground pixels to uniquely label each foreground object. The
system has been made capable of tracking objects even in the event that its algorithm did
not segment people as a single foreground object. This is anticipated in cases like
temporary occlusion or if the object has been split into pieces. In such an event the
system uses local correlation techniques to attempt to track parts of the interacting
objects.
Figure 19: The figure shows an example of how temporal templates are updated over time. The pictures
have been obtained from [Haritaoglu - I 1998]
Other than tracking the human body as a whole the system also wants to locate body parts
such as head, hands, torso, legs and feet and understand the actions they undergo. W4
accomplishes these tasks by utilizing its shape analysis and template matching
techniques. This is the best method to track body parts when some of the parts of a
human body are occluded and the shape is not predictable. The shape model has been
implemented by them using a Cardboard Model which represents the relative positions
and sizes of body parts. The cardboard model along with second order predictive motion
models of the body can be used to predict the position of humans in different frames of a
video sequence. This cardboard model used is representative of person who is in an
upright standing pose. It is used to track the body parts like (head, torso, feet, hands,
legs). Firstly the pixels inside the boxes which are applied are used to calculate the
principal axis which helps us in estimating the pose of the body parts that are being
tracked. The head is located first, followed by the torso and legs, then the hands are
located and finally the feet are located as end regions. After predicting the positions they
are accurately confirmed by using temporal texture templates.
39
Figure 20: The figure shows an example demonstrating the use of cardboard model to locate body parts of
humans using infrared imagery. The pictures have been obtained from [Haritaoglu - I 1998]
The W4 is a real time system and processes 20-30 frames per second (depending on
image resolution) on a dual Pentium processor. It is capable of tracking multiple people
at the same time against a complex background.
40
1.5. Calibration of infrared Cameras
In this section we shall very briefly discuss radiometric calibration of infrared cameras
and present our review on the commercial survey of black body calibrators conducted by
us. The output that an infrared camera gives is the sum of the radiation emitted by the
area that is being imaged, radiations emitted by the surroundings (background) and
reflected by the target, and radiation emitted by the atmosphere itself. Calibration is a
process in which raw data collected from infrared cameras is converted into a
standardized format so that an image captured with different cameras gives us the same
piece of information. The need for calibration is felt to obtain accurate temperature
information from the scene whose image we are having. For this purpose the infrared
camera needs to be calibrated, this can be accomplished either by using a reference
blackbody source of known temperature or by spectral radiometric calibration, against
calibrated reference detectors.
Radiometric calibration establishes a direct one to one relationship between the gray level
value response at a pixel and the amount of absolute thermal emission from the
corresponding scene element [Socolinsky 2001]. The above relationship is called
responsivity. Thermal emission is measured as flux and like power has the units of
(W/cm2). In case of LWIR cameras the gray level response of thermal IR pixels is linear
with respect to the thermal intensity of incident thermal radiation. The slope of the
responsivity curve is the gain and the y-intercept is the offset. The variations in offset and
gain are significant from pixel to pixel over an infrared focal plane array. In the process
of radiometric calibration images of a black-body radiator covering the entire field of
view are obtained at two known temperatures [Socolinsky 2001]. Next the gain and offset
is computed using the radiant flux of that black-body at a given temperature. For this we
need to have the Emisivity curve of that black-body. Emisivity is a function of
temperature and is given by Planck’s Law which states that the flux emitted at the
wavelength λ by a blackbody which is at a known temperature T is as given below.
41
)1(
2),(5
2
−=
kThc
e
hcTWλλ
πλ
Here h is Planck’s constant, k is Boltzmann’s constant and c is the speed of light in
vacuum. Thus the flux observed by a sensor is as given in equation below, where R (λ) is
the responsivity.
∫= λλλ dRTWTW )(),()(
This radiometric calibration lasts only in those surrounding conditions in which the
calibration was done (i.e.) if a camera was radiometrically calibrated indoors, taking it
outdoors in the presence of significant ambient temperature difference will cause the gain
and offset of linear responsivity of focal plane array pixels to change. Hence we would
require doing the radiometric calibration again. This effect is mainly due to the optics and
the heating up of the FPA which as a result causes the sensor to see more energy
[Socolinsky 2001]. Thus radiometric calibration tends to standardize all thermal IR data
collections, whether they are taken under different conditions or with different cameras or
at different times. In the process of radiometric calibration firstly the spectral response of
the system is obtained and then the radiometric calibration is done.
1.5.1 Survey on Black body calibrators for IR cameras
Objects which are not at a temperature of absolute zero radiate energy in the form of
electromagnetic (EM) waves. A system called blackbody absorbs all these radiations that
it receives and radiates back more thermal radiation for wavelengths of different intervals
covering the entire spectrum. There never exists an ideal blackbody; only some specially
developed laboratory sources emit radiation with up to 98% efficiency which makes them
comparable to a blackbody. The table given shows us a review of the survey conducted
on commercially available black body calibrators for infrared cameras.
42
Model Accuracy Target
size
Operating
Temperature
Range
Emisivity Power Dimensions Price
Omega
BB - 2A
±2% of rdg
(±5%)
6” (dia)
plate 100°F – 662°F 0.95
115Vac;
50/60Hz. 5” X 6.3” X 5” $650
Omega
BB - 4A
±1°C;
±0.25%;
(±1.8°F
±0.25%) rdg
0.88”(dia)
plate
212°F - 1800°F
0.99
115Vac;
50/60 Hz or
230Vac;
50/60 Hz,
400 W.
7.5” X 16.12” X
10.4”
$3595
Omega
BB – 701
±0.8°C +1
Digit (±1.4°F )
[worst case]
2.5” (dia)
plate
0°F - 300°F
0.95
BB701:
115V ac;
50/60 Hz,
175W
BB701-
230VAC:
230 V ac;
50/60 Hz,
175W
7.75” X 14.128”
X 15.5”
$2995
Omega
BB – 703
±1.4°C
(±2.5°F)
1.125” (dia)
plate
20°F - 752°F
0.95
BB703:
115Vac,
50/60 Hz
175 W
BB703-
230VAC:
230Vac,
50/60 Hz,
175 W
5” X 2.2” X 6.1”
$890
43
Omega
BB – 704
±0.8°C
(±1.4°F)
4” (dia)
plate
212°F - 752°F
0.95
115Vac,
50/60 Hz or
230Vac
50/60 Hz,
425 W
16.12” X 7.5” X
10.38”
$2495
Omega
BB – 705
±0.25% of
reading;
±1°C
1.75” (dia)
plate
212°F - 1915°F
0.99
115Vac,
50/60 Hz or
230Vac,
50/60 Hz
22.3” X 20.5” X
23.6”
$9995
Hotek
Model 988 ±0.3°F
2.76” (dia)
plate
70°F - 115°F
0.97±0.02
70 W
9.06” X 8.86” X
4.53”
_____
Nagman
BBSL
Within 0.5%
of indicated
temperature
with a
minimum of
3°F
0.49”(dia)
aperture
45°F - 1112°F
Better than
0.97
220V ac
± 10%,
50 Hz / 500 W
9.84” X 12.4” X
5.31”
_____
Nagman
BBSH
Within 0.5%
of indicated
temperature
with a
minimum of
5°F
0.98”(dia)
aperture
932°F - 2192°F
Better than
0.97
220V ac
±10%,
50 Hz / 1000W
14.76” X 12.2” X
5.91”
_____
Hart
Scientific
HR 9132
0.9°F - 212°F
±1.4°F -
932°F
2.25” (dia)
122°F - 932°F
0.95 ± 0.02 from 8 to
14µm
115V ac (±10%), 3 A or
230V ac (±10%), 1.5 A,
switchable, 50/60 Hz.
4” X 6” X 7”
$2570
44
Hart
Scientific
HR 9133
±0.25% of
reading;
±1°C
2.25” (dia)
–22°F - 302°F at 73°F ambient
0.95 ±0.02 from 8 to
14µm
115V ac (±10%), 1.5 A,
230V ac (±10%), 1.0A,
switchable, 50/60 Hz
6” X 11.25” X
10.5”
$3710
MIKRON
M340
±3°C
2” (dia)
aperture
–4°F - 300°F
0.99 (+ 0.005 –
0.000)
115V ac ±5%, 50/60Hz
300w max. (230VAC optional)
6.57” X 11.02” X 11.02”
_____
MIKRON
M310,
M315
±0.25% of reading ±1°C
3” (dia)
aperture
41°F - 662°F
0.99 (+ 0.005 –
0.000)
115V ac ±5%, 50/60Hz
300w max. (230VAC optional)
6.57” X 11.02”
X 11.02” _____
MIKRON
M320 (Dual Cavity)
±0.25% of reading ±1°C
3” (dia)
aperture
50°F - 570°F
0.99 (+ 0.005 –
0.000)
230V ac, ±10%, 50/60Hz
1.5kw max. (115VAC optional)
25.2” X 19.69”
X 21.65” _____
MIKRON
M305
±0.25% of
reading ±1
digit
1” (dia)
aperture
210°F - 1830°F
0.995 (+ 0.0005 –
0.0000)
115V ac ±10%,
50/60Hz, 1.0kw max. (230VAC optional)
10.63” X 16.93”
X 14.57” _____
MIKRON
M335
±0.4% of
reading ±1
digit
0.65” (dia)
aperture
570°F - 2730°F
0.99 (+ 0.003 –
0.000)
115V ac ±10%,
50/60Hz, 2.0kw max. (230VAC optional)
25.2” X 19.69”
X 21.65” _____
45
MIKRON
M330
±0.25% of reading ±1°C
1” (dia)
aperture
572°F - 3100°F
0.99 + 0.005 – 0.000
208 to 230Vac; ±10%, 50/60Hz
15kw
67.32” X 22.05”
X 32.28” _____
Table shows the review of the commercial survey of blackbody calibrators for infrared cameras conducted
by us.
46
1.6. Conclusions
We just completed A Survey on Infrared Imaging with main focus being on its use in
Human Tracking systems. Different applications of infrared imaging were studied. All the
approaches to human tracking using infrared imaging were studied reviewed. We also
looked at object tracking and face region tracking in infrared imaging. A commercial
survey on all the available black body calibrators for infrared cameras was presented.
1.6.1. Summary
We presented a brief review on all the major applications of infrared imaging after
looking into all the possible applications of infrared imaging and having conducted an in
depth survey on its use in human tracking systems. The other applications of infrared
imaging that were touched upon were motion detection, face recognition, pattern
recognition, intrusion detection, surveillance and others but we were very brief about
their details. The survey mainly focused on general object tracking and human tracking
techniques using infrared imaging. The different papers reviewed tracked humans using a
single sensor, or a sensor network. Human tracking was accomplished by first performing
motion detection and then performing template matching for confirming that the object
that caused motion was a human being. During tracking features like human face, head,
bust and silhouette were looked for. We also looked at some of the calibration techniques
for infrared cameras and presented a commercial survey on different available black body
calibrators for infrared cameras.
1.6.2. Future Work
Future tasks would involve designing our own infrared imaging based Human Tracking
system. For this we would have to design a motion detection algorithm, a template
matching algorithm for ascertaining that the motion was caused by a human being and
then a tracking system to continuously track the detected humans. We could borrow some
ideas from the past work to get started into our work and then improve upon them to
reach our own approach.
47
References
Papers
[Feller 2002] Feller Steven D., Evan Cull, David Kowalski, Kyle Farlow, John Burchett,
Jim Adleman, Charles Lin, David J. Brady, “Tracking and imaging humans on
heterogeneous infrared sensor array for tactical applications”, SPIE Aerosense 2002,
April, 2002.
[Nakamura 2001] Nakamura Masanobu, Huijing Zhao, Ryosuke Shibasaki, “Tracking
passenger movement with infrared video data”, Proc. ACRS 2001 - 22nd Asian
Conference on Remote Sensing, Vol. 2, pp. 1520-1523, Singapore, 5-9 November 2001.
[Fang] Fang Yajun, Ichiro Masaki & Berthold K. P. Horn, “New night visionary
pedestrian detection and display systems”, Artificial Intelligence Laboratory, MIT,
Cambridge, MA.
[Xu 2002] Xu Fengliang, Kikuo Fujimara, “Pedestrian Detection and tracking with night
vision”, Proc. IEEE Intelligent vehicles Symposium, Versailles, France, 18-20 June,
2002.
[Haritaoglu-I 1998] Haritaoglu Ismail, David Harwood and Larry S. Davis, “W4: Who?
When? Where? What? a real time system for detecting and tracking people”, Proc. Third
International Conference on Face and Gesture Recognition, pp. 222-227, Nara, Japan,
April, 14-16, 1998.
[Nanda 2002] Nanda Harsh and Larry Davis, “Probabilistic template based pedestrian
detection in infrared videos”, Proc. IEEE Intelligent Vehicle Symposium, Versailles,
France, 18-20 June, 2002.
48
[Eveland 2001] Eveland Christopher K., Diego A. Socolinsky, Lawrence B. Wolff,
“Tracking human faces in infrared video”, CVPR Workshop on Computer Vision beyond
the Visible Spectrum, Kauai, December 2001.
[Braga-Neto 1999] Braga-Neto Ulisses, Manish Choudhary and John Goutsias,
“Automatic target detection and tracking in forward-looking infrared image sequences
using morphological connected operators”, 33rd Annual Conference on Information
Sciences and Systems - CISS'99, Vol. I, pp. 173-178, Baltimore, MD, March 1999.
[Bodor 2003] Bodor Robert, Bennett Jackson, Nikolaos Papanikolopoulos, “Vision-based
human tracking and activity recognition” Proc. of the 11th Mediterranean Conf. on
Control and Automation, 18-20 June, 2003.
[Magneau 2002] Magneau Olivier, Patrick Bourdot, Rachid Gherbi, “3D tracking based
on infrared cameras”, Proc. International Conference on Computer Vision and Graphics,
Zakopane, Poland, September, 2002.
[LETA] The Law Enforcement Thermographers Association (LETA), “A Paper on 11
applications of infrared imaging recognized by them”
[Socolinsky 2001] Socolinsky Diego A., Lawrence B. Wolff, Joshua D. Neuheisel,
Christopher K. Eveland, “Illumination invariant face recognition using thermal infrared
imagery”, Proc. IEEE Computer Society Conference on Computer Vision and Pattern
Recognition (CVPR 2001), 2001.
[Xu 2003] Xu Fengliang, Kikuo Fujimara, “Human detection using depth and gray
images”, Proc. IEEE International Conference on Advanced Video and Signal based
Surveillance, Miami, FL, 21-22 July 2003.
[Haritaoglu-II 1998] Haritaoglu Ismail, David Harwood and Larry S. Davis, “Ghost: A
human body part labeling system using silhouettes”, 14th International Conference on
Pattern Recognition, Brisbane, pp.77-82, Australia, 16-20 August, 1998.
49
[Kim 2003] Kim Young-Ouk, Joonki Paik, Jingu Heo, Andreas Koschan, Besma Abidi
and Mongi Abidi, “Automatic face region tracking for highly accurate face recognition in
unconstrained environments”, Proc. IEEE International Conference on Advanced Video
and Signal based Surveillance, Pages 29-36, Miami, FL, 21-22 July 2003.
[Prokoski 2000] Prokoski F., “History, current status, and future of infrared
identification”, Proc. IEEE Workshop on Computer Vision Beyond the Visible Spectrum:
Methods and Applications, pp. 5-14, Hilton Head Island, SC, 16 June 2000.
Publications Bibliography:
Fujiwara Hideto, Makiko Seki, Kazuhiko Sumi and Hitoshi Habe, “The Vehicle Tracking
Method using Texture based Background Subtraction”, Proceedings of the 7th Symposium
on Sensing via Image Information, pp. 17-22, 2000.
Jones B., “Design of a remotely operated intrusion detection system for security
applications”, Proceedings of IEEE International Carnahan Conference on Security
Technology, pp. 145-153, 1993.
Nakanishi Yasuto, Kenji Oka, Masayuki Kuramochi, Shohei Matsukawa, Yoichi Sato
and Hideki Koike, “Narrative Hand: Applying a fast-finger tracking system for media
art”, 11th International Symposium on Electronic Art (ISEA 2002), 2002.
Proceedings of IEEE International Conference on Advanced Video and Signal based
Surveillance, Miami, FL, 21-22 July 2003. (IRIS Publications Resources).
Shunsuke Kamijo, Yasuyuki Matsushita, Katsushi Ikeuchi and Masao Sakauchi, “Traffic
Monitoring and Accident Detection at Inter sections”, UM3, 2000.
Srivastava Anuj, and Xiuwen Liu, “Statistical hypothesis pruning for identifying faces
from infrared images”, Journal of Image and vision computing, 21(7), pp. 651-661, 2003.
50
Sugimura Koji, Yasuo Suga and Junichi Tujitani, “Counting system of pedestrian”,
Proceedings of the 7th Symposium on Sensing via Image Information, pp. 357-362, 2001.
B. Maurin, O. Masoud and N. Papanikolopoulos, “Monitoring Crowded Traffic Scenes”,
Proceedings of the IEEE 5th International Conference on Intelligent Transportation
Systems (ITSC 2002), pp 19-24, Singapore, September 3–6, 2002.
C.R. Wren and A.P. Pentland, “Dynamic Models of Human Motion,” Proceedings of the
3rd IEEE International Conference on Automatic Face and Gesture Recognition, April
1998.
Websites
[1] Temperature sensor community website, “Applications of Infrared Imaging on
Temperatures.com”.
http://www.temperatures.com/tiapps.html
[2] A seminar presentation on the applications of infrared imaging, “Teamworknet Inc
website”.
http://www.teamworknet.com/ResourceLibrary/Presentations/IEEEThermalPresentation/
default.aspx#1
[3]. Infrared imaging based automated video security system, “Southwest Research
Institute website”.
http://www.swri.edu/4org/d10/autoeng/video/default.htm
[4]. Applications of infrared imaging, “Marlow Industries Inc. website”.
http://www.marlow.com/
[5]. Applications of infrared imaging, “Australian Thermal Imaging website”.
http://www.thermalimaging.com.au/index.html
51
[6]. A real time infrared tracking system for virtual environments, “ERCIM official
website”.
http://www.ercim.org/publication/Ercim_News/enw53/foursa.html
[7]. Reasoning and sensing for visual and infrared data, “ECE 573 Fall 2003 internal
webpage of Balasubramanian L”.
http://www.imaging.utk.edu/classes/fall2003/modsen/ece573/bala/index.htm
[8]. Infrared Imaging in Modular Multipurpose Multi-sensor Robot, “Nikhil Naik's ECE
573 Fall 2003 internal webpage”.
http://www.imaging.utk.edu/classes/fall2003/modsen/ece573/nikhil/webtemplate/index.ht
m
[9] Black body calibrator’s manufacturer, "Omega Engineering Inc. website".
http://www.omega.com/toc_asp/subsectionSC.asp?subsection=K02&book=Temperature
[10] Black body calibrator’s manufacturer, "Hotek Technologies website".
http://www.hotektech.com/Isocomp.htm
[11] Black body calibrator’s manufacturer, "Nagman website".
http://www.nagman.com/body.asp
[12] Black body calibrator’s suppliers, "Davis Inotek Instruments website".
http://www.davis.com/showpage.asp?L3ID=1244
[13] Black body calibrator’s manufacturer, "Mikron Institute website".
http://www.mikroninst.com/literature/blackbody.pdf
52