Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Multi-camera System for Stone Slab Scanning
João Henrique de Agrela Vital
Thesis to obtain the Master of Science Degree in
Mechanical Engineering
Supervisors: Prof. Jorge Manuel Mateus MartinsProf. Pedro Daniel Dinis Teodoro
Examination Committee
Chairperson: Prof. Paulo Jorge Coelho Ramalho OliveiraSupervisor: Prof. Jorge Manuel Mateus Martins
Member of the Committee: Prof. João Rogério Caldas Pinto
June 2018
Acknowledgements
A sincere acknowledgement to my supervisors, Prof. Jorge M. M. Martins and Prof. Pedro D. D. Teodoro,
who guided me through the project by sharing their experience.
I would also like to express my gratitude to the Frontwave, S.A. team, who was always welcoming and
ready to help when needed. A special attention to Eng. Nuno Reis, for helping me with everything
related to hardware and structure assembly.
Finally, a warm thank you to my family, who are the base of my education and values, for supporting me
unconditionally. To my girlfriend, for being by my side and cheering me on. To Manas Pitch, for being
the family that I chose inside the academic institution, who accompanied me throughout the years and
made it a very happy journey.
i
Resumo
Com a Indústria 4.0, alguns setores sofreram uma revolução completa. No entanto, a indústria de
processamento de pedra ainda assenta em processos não-ótimos. Devido ao carácter familiar das em-
presas e da grande variabilidade da matéria-prima, a resistência à mudança por parte deste setor é
bastante elevada. Empresas como a Frontwave, S.A. desenvolvem projetos que visam elevar os stan-
dards do setor e acreditam que o primeiro passo consiste em descrever a geometria e cor dos produtos
finais. A aquisição destes dados irá permitir um planeamento cuidado das próximas operações sobre
o produto, evitando desperdícios. O planeamento poderá então ser enviado para uma máquina CNC
ou qualquer outra máquina de processamento físico para uma execução limpa e planeada. A imagem
do produto pode também ser utilizada no processo de classificação, gestão de stock e vendas não-
presenciais. Este projeto propõe uma nova solução para a aquisição de uma imagem correspondente
ao produto. O desenvolvimento teve como objetivos um aumento na resolução de imagem e uma min-
imização dos custos associados à produção da máquina. O projeto culminou num sistema consistindo
numa régua de câmaras e respetivos controladores. Estes módulos são utilizados para uma primeira
fase de processamento distribuído, enviando as contribuições resultantes para um PC, onde a imagem
final é reconstruída. O processo baseia-se em algoritmos atuais como registo de imagem por pesquisa
grosseira-fina, mapeamento de dados utilizando funções de base radial, e métodos inteligentes de fusão
de imagens, que foram adaptados e implementados para servir o processo em causa. Comparando os
resultados do projeto com a máquina atual, produzida pela Frontwave, S.A., a solução proposta atinge
uma resolução de imagem dez vezes superior e permite poupar 30% em custos de equipamento de im-
agem. A disposição das câmaras a uma distância inferior do produto permite ainda reduzir o tamanho
da estrutura, na dimensão da distância das câmaras ao produto, em aproximadamente 80%.
Palavras-chave: Indústria 4.0, Visão Computacional, Pedra Ornamental, Digitalização
ii
Abstract
At the dawn of Industry 4.0, some sectors found its methods completely revolutionized. However, the
stone industry still relies on old, sub-optimal processes. For being traditionally a family business and
dealing with non-standardised raw-materials, this industry is very resistant to change. Companies like
Frontwave, S.A. are dedicated to rising the stone industry to higher standards and believe that the first
step is to create an accurate description of the geometry and colour of the final products, in the form
of an image. The data will allow to carefully plan the next operations, avoiding waste. These plans
can then be sent over to a CNC machine or any other processing machine, for a clean and planned
execution. Additionally, the image may be used for product classification, stock management, non-store
retailing and post processing planning. This thesis proposes a new solution for the acquisition of a pic-
ture describing a stone slab. The development was driven by achieving the highest image resolution
with minimal costs. The resulting system consists of an array of cameras and respective controllers.
The controller modules serve as a primary processing stage, sending the outputs to a PC, which re-
constructs the final image. State-of-the-art methods like coarse-to-fine matching, radial basis function
warping and multi-resolution splining were adapted and implemented to achieve the best results with the
least computational expense. Comparing with the current scanning machine developed by the company
Frontwave, S.A., the solution proposed achieves ten times more resolution and saves 30% in imaging
equipment costs. Additionally, the camera to slab distance was reduced by 80%, allowing for a much
slimmer scanner.
Keywords: Industry4.0, Computer Vision, Ornamental Stone, Scanner
iii
Table of Contents
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i
Resumo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
Table of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii
Nomenclature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
1 Introduction 1
1.1 Industry Paradigm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Ornamental Stone Resources and Production in Portugal . . . . . . . . . . . . . . . . . . 2
1.3 The Company . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.4 The challenge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Background 9
2.1 Stone Scanning Machines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2 Feature Detection and Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.1 Feature detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2.2 Feature Description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.3 Feature matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.4 Colour Balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Colour Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6 Results 27
6.1 Single Image Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
6.2 Feature Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
6.3 Image Fusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
iv
6.4 Computation Time and Efficiency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7 Conclusions 34
7.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.1.1 Automatic Calibration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.1.2 Real-time validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.1.3 Implementing GANS for up-scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
References 37
v
List of Figures
1.1 Industrial evolution time-line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Total production of ornamental stones from 1992 to 2002, in thousands of tonnes. . . . . 3
1.3 Revenues from exports of ornamental stones from 2005 to 2013. . . . . . . . . . . . . . . 3
1.4 Distribution of mining sites in Portugal according to the type of stone extracted. . . . . . . 5
2.1 Classical example of the aperture problem. . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.2 Image filtered with gaussian filters of different sizes and standard deviations. . . . . . . . 12
2.3 Edges highlighted using the Roberts, Prewitt, Sobel, Canny and Laplacian of Gaussian
methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Corners detected using the Harris and the Shi-Tomasi methods, from left to right . . . . . 13
2.5 Blobs detected using different detection methods. . . . . . . . . . . . . . . . . . . . . . . . 14
2.6 Example of Gaussian and Laplacian pyramids built using the Burt and Adelson’s method. 15
2.7 Graphical explanation of the theory behind the SIFT descriptor. . . . . . . . . . . . . . . . 16
2.8 Plots showing different feathering window types. . . . . . . . . . . . . . . . . . . . . . . . 19
2.9 Images representing overlapping regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.10 Results of merging the images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.11 Graphical representation of the feathering windows used by the different methods for
image blending. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.12 Image resulting from splining the two sample images using the multi-resolution method
proposed by Burt and Adelson. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.13 Model of the colour checker used in the project. . . . . . . . . . . . . . . . . . . . . . . . . 24
6.1 Image reconstruction using different approaches. . . . . . . . . . . . . . . . . . . . . . . . 28
6.2 Close ups on figure 6.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
6.3 Example of matches found with the SIFT algorithm. . . . . . . . . . . . . . . . . . . . . . 29
6.4 Results from different matching methods. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
6.5 Images containing high frequency information blended using the tested methods. . . . . . 31
6.6 Images containing low frequency information blended using the tested methods. . . . . . 31
vi
List of Tables
1.1 SWOT analysis of the Stone Industry in Portugal. . . . . . . . . . . . . . . . . . . . . . . . 4
6.1 Matching methods comparison. Relative to the reference match, made by hand. . . . . . 30
6.2 Final output dimensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
vii
Abbreviations
BMIE Block Matching with Initial Estimation. 29, 30
CEVALOR Centro Tecnológico para Aproveitamento e Valorização das Rochas Ornamentais e Indus-
triais. 4
DLT Direct Linear Transformation. 17, 18
DoG Difference of Gaussians. 14
GANS Generative Adversarial Networks. 35
GCDF Gaussian Cumulative Distribution Function. 18, 19, 31
GLOH Gradient Location and Orientation Histogram. 15, 16
LoG Laplacian of Gaussian. 14
MSER Maximally Stable Extremal Regions. 14, 29, 30
PC Personal Computer. 34
PCA Principal Component Analysis. 16
RAM Random Access Memory. 32
RANSAC Random Sample Consensus. 18, 29
RBF Radial Basis Function. 24–26
RGB Red, Green and Blue. 24, 26
SIFT Scale Invariant Feature Transform. 15, 16, 23, 29, 30
SURF Speeded Up Robust Features. 15, 29, 30
SWOT Strengths, Weaknesses, Opportunities and Threats. 3–5
viii
Nomenclature
Image Analysis
I Image
g(σ) Gaussian Filter
σ Standard Deviation
Iσ Gaussian Filtered Image
~ Convolution operator
∇2 Laplacian operator
Ii Image derivative over the axis i
S Structure Tensor, Second moment matrix
λi ith eigenvalue of a matrix
H Hessian Matrix
LoG Laplacian of Gaussian
Image Matching
H Homography matrix
R Rotation Matrix
T Translation Matrix
Radial Basis Functions
Φ Form function
p(x) Polynomial function
General
J Cost function
G Colour homogenization gains matrix
ix
E Error function
x
Chapter 1
Introduction
This chapter presents the general context where this project is inserted. Given the industry orientation
inherent to the project, an overview of the current industrial standards is provided in the first section,
Industry Paradigm. The project is motivated by the needs of potential clients and validated by the
current scenery of the industry sector in Portugal, taken as a first market approach. Both these topics
are addressed in the second section, Ornamental Stone Resources and Production in Portugal.
The development was supported by and took place at Frontwave, S.A’s facilities in Pêro Pinheiro, a brief
summary about the company is given in the third section, The Company. The project’s goals are defined
in the fourth section, The challenge and the contributions and achievements are presented in the fifth
section, Contributions. Finally, the sixth section, Thesis Structure provides a detailed structure of the
remaining document.
1.1 Industry Paradigm
During the last century, the Industrial activities have experienced significant development through stan-
dardisation, automation and production management. Although the developments are available to any
kind of industry, the nature of the business makes it easier or harder to implement the strategies. Up to
the current date, the technological development has been split up in four revolutions, portrayed in Figure
1.1. The first one traces back to 1780 with the first loom moved by steam, which relocated produc-
tion from homes to factories. The second revolution started 100 years later with continuous production
driven by the division of labour and introduction of conveyor belts. The third was marked by the intro-
duction of programmable logic controllers which enabled digital programming of autonomous systems.
The latter paradigm still rules today’s modern system engineering and allows for efficient and flexible
automation systems. The introduction of internet technologies into industry announced the arrival of
the fourth industrial revolution. Industry 4.0 is closely related to the implementation of cyber physical
systems. This means that every product, component and entity in the industrial process have an identity
1
Figure 1.1: Industrial evolution time-line.
on the network, enabling permanent communication and data traffic. This data can be used in optimiza-
tion algorithms for dynamic scheduling, opening new paths for autonomous product navigation through
the production line. There are currently many companies, organizations and universities working on the
transition of industrial paradigm, following certain prerequisites like:
1. Investment Protection: Industry 4.0 should be stepwise introducible into existing plants.
2. Stability: Industry 4.0 should not compromise production.
3. Data Privacy: access to data related to production and services must be controlled to protect the
company’s know-how.
4. Cybersecurity: Production systems must be programmed not to cause damage to the environment,
economy or humans.
"For Industrie 4.0, the term revolution does not refer to the technical realization but to the ability to meet
today’s as well as future challenges." [1].
1.2 Ornamental Stone Resources and Production in Portugal
Portugal is one of the world’s leading producers of ornamental stones being placed 9th in the worldwide
production rank. It has internationally renowned products such as the white and pink marbles, and
produces large quantities of light cream limestones, grey, yellow and pink granites, and dark grey slate.
Figure 1.2 shows the total production of ornamental stones from 1992 to 2002, as shown in the study
conducted by Sobreiro [2], and Figure 1.3 shows the revenues from international trade of ornamental
stone from 2005 to 2012, as shown in the work by Espírito Santo Research team [3]. Carvalho et al [4]
estimate a total resource availability of 410 million cubic meters, from which 274 million refer to granite,
76 million to limestone, 51 million to marble and 9 million to slate. Marble extraction comes mainly from
the region of Estremoz and Borba. The main limestone mining sites are located on the regions of Leiria
and Coimbra and the main mining sites for granite are located North, around the area of Monção and
Valença. Figure 1.4 shows the distribution of mining sites in Portugal.
2
Figure 1.2: Total production of ornamental stones from 1992 to 2002, in thousands of tonnes.
Figure 1.3: Revenues from exports of ornamental stones from 2005 to 2013.
A study conducted by Banco Espírito Santo [3] shows that traditional producers like Portugal, Spain
and Italy are loosing the market share to countries like China, India and Turkey, which have high re-
source availability and financial support for the development of the sector. This reinforces the need for
technological advancement as well as financial incentives for the implementation of new market strate-
gies to keep up with the competitive countries. A Strengths, Weaknesses, Opportunities and Threats
(SWOT) analysis 1 to the stone industry sector in Portugal reveals that the large amounts and high qual-
ity of the internationally recognised stones, as well as the know-how and long lasting tradition represent
strong points. The sector is threatened by the strong competition from countries like China, India and
Turkey and the development products which replaces stone. The sector’s weaknesses include the lack
of marketing strategy, competent management teams and inter-corporate cooperation. Finally, there are
1A SWOT analysis, acronym for Strengths, Weaknesses, Opportunities and Threats, is a business assessment
tool and serves as a decision tool for the future development of the company in its own commercial sector. The
theory was introduced by Albert S. Humphrey in the 1960’s.
3
opportunities in finding new production solutions and uses for stone products, as well as reaching out for
international markets. Table 1.1 shows a complete and detailed scheme of the SWOT analysis. Centro
Tecnológico para Aproveitamento e Valorização das Rochas Ornamentais e Industriais (CEVALOR) [5]
processed this analysis and summarized the critical concerns for the future success of the stone industry
on the following topics:
1. Improve marketing and communication strategies;
2. Increase the globalization efforts;
3. Specialization in non-standard products;
4. Increase the added value by extending the supply chain closer to the final consumer;
5. Investment in Human Resources qualification and training;
Helpful(to achieve the objective)
Harmful(to achieve the objective)
Inte
rnal
orig
in(p
rodu
ct/c
ompa
nyat
trib
utes
)
SLarge quantity and high quality of resources
Globally renowned products
Products exclusive to Portugal
Own know-how and technology
Long-lasting tradition of the businessW
Marketing and Management Strategy
Multiple small companies
Low inter-company cooperation
Poor human resource skills
Ext
erna
lorig
in(e
nviro
nmen
t/mar
keta
ttrib
utes
)
ONew production solutions
Alternative uses for ornamental stone
Globalization
New markets
Training of Human resourcesT
Strong competitors (China, India)
Alternative products
Environmental issues
Table 1.1: SWOT analysis of the Stone Industry in Portugal.
The processing of ornamental stones is mechanized and automated. However, production management
and quality control still rely on human decision. This places the industry in level 3.0, meaning that there is
a margin of improvement and work to be done to bring it to a higher technological level. As stated in the
previous section, the type of business may present resistance to change, making it harder to transition
from an industrial scene to another. In Portugal, the stone industry presents resistance to change due
to the following topics:
1. The stone industry traces back thousands of years, since stone started to be extracted and pro-
cessed, and in Portugal it is usually a family-based business;
2. Stone is a natural product, making standardisation difficult due to high variance of its characteris-
tics, i.e. dimensions, patterns, physical properties.
Figures 1.2 and 1.3 show that the production and sales are increasing. However, market analysis shows
the need for innovation to keep the business sustainable and profitable.
4
Figure 1.4: Distribution of mining sites in Portugal according to the type of stone extracted.
1.3 The Company
Frontwave, S.A. [6] is a company dedicated to developing solutions which bring the stone industry to
higher standards. Over the years of activity, the owners of stone processing factories showed the need
to describe their final products digitally, in the form of an image, with the primary objective of stock
management and serving as marketing material. Additionally, it could be used for quality control, material
classification and as a reference for future processing of the slab. Consequently, the company started a
project with the goal of obtaining an image of the polished slabs using a machine placed at the end of the
production line. The project is called Stone Scan and is in the course of being successfully introduced in
the market. The images acquired may be used to showcase the factory’s products on-line, making the
information available worldwide and portraying an appealing and clean view of the product, leading to an
easier communication with a potential client anywhere in the world. This is in line with the conclusions
drawn from the SWOT analysis presented in the previous Section 1.2.
1.4 The challenge
The first scanning machines provided a strong step towards the intended target. However, the image
resolution and the system adaptability and flexibility to the production lines do not meet the desired goals
yet. In addition, the cost of production was high for the results obtained. The challenge given by the
company was to design a machine capable of doing a better job than the previously developed one, with
the following requisites, relative to the Stone Scan machine:
5
1. Better Resolution
2. Lighter structure with less volume
3. Modularity of the machine, i.e. possibility to change the number of cameras, to fit the production
line.
4. Design a cheaper solution
The goal was to achieve better results using cheaper hardware making it a more competitive product.
This poses several problems to solve, such as multi-camera calibration and stitching and overall colour
correction. The results of the first machine built and the need of technological advancement in the stone
sector provided validation and motivation to carry out the development of a new version of the scanning
machine, using state-of-the-art algorithms and carefully selected hardware to fulfil the goals to the best
possible degree.
1.5 Contributions
The project encompassed fundamental steps in mechanical project: Hardware and structure design,
Software development and Validation through testing the final prototype. The main contributions in each
step were as follows:
• Hardware and Structure Design - Inspired by the layout of existing machines, the structure de-
sign and hardware chosen allows to significantly reduce the size and cost, achieving a better
performance. While the structure is similar to the available machines, the innovation lies on the
type of hardware used for this solution.
• Software Development - The software was designed for camera modularity and flexibility and
consists of a chain of events resulting in an image portraying the stone slab. The procedure is
supported by existing algorithms which were adapted and chained together to produce the final
output. This phase of the project consisted in the following steps:
1. Creation of a local network for communication between Linux devices (camera controllers)
and Windows Devices (PC), using SSH, SCP protocols.
2. Development of Python Scripts for synchronized image acquisition.
3. Development of Python Scripts for image reconstruction, including geometric and colour cor-
rections.
4. Development of MATLAB R© Scripts for Feature Matching, Colour Mapping and Image Blend-
ing
• Testing - During the testing phase, the methods used in the project were tested and evaluated
against state-of-the-art methods available in MATLAB R© toolboxes. The results can serve as a
baseline for future development of applications working in similar conditions.
6
As far as the extensive research and reading took me, this solution is innovative both in the hardware
used and in the chain of algorithms which lead to the final output.
1.6 Thesis Structure
The rest of the document is divided in the following chapters:
Chapter 2 – Background
Provides an insight on the current solutions available for the scanning of stone slabs as well as a theo-
retical background on state of the art feature detection and matching, colour balancing and image fusion
techniques. Some served as support in the development of the solution proposed in this document and
some are present for comparative reasons and further information.
??
This chapter is confidential and is presented in Appendix ??. It presents the hardware used in the
physical setup as well as the connectivity between the devices on the system. The final section of
this chapter outlines the main processes needed to carry out the scanning procedure and defines the
requirements and constraints taken in consideration in the development of the system.
??
This chapter is confidential and is presented in Appendix ??. The system’s successful implementa-
tion depends strongly on a correct calibration procedure. This chapter takes the reader through the
calibration of the system, describing the algorithms used in each step.
??
This chapter is confidential and is presented in Appendix ??. The data acquired in the calibration
process is used in the image acquisition and processing such as camera rotation corrections and colour
balancing. This chapter covers the process from image acquisition to the final panoramic output.
Chapter 6 – Results
To assess the performance of the system and validate the choices made in its development, this chapter
presents and compares the results with different approaches using state of the art algorithms.
7
Chapter 7 – Conclusions
A summary of the achievements of the project is presented. The final part of this chapter is dedicated to
the proposed future work, improvements of the scanning machine, and scientific research which can be
done from its outputs.
8
Chapter 2
Background
This chapter presents the theoretical basis as well as the existing state-of-the-art algorithms and equip-
ment used to support the development. An overview of the existing equipments for stone slab scanning
is presented in the first section Stone Scanning Machines. The device developed uses a multi-camera
array for scanning, hence the need to match the resulting images from each camera. The second sec-
tion, Feature Detection and Matching, provides a review spanning from the basics of feature detection
and matching to the current state-of-the-art algorithms. Merging the matched regions can be challeng-
ing due to small discrepancies in size, rotation and colour of the features. The third section, Image
Fusion, presents a review of different feathering methods along with advantages and disadvantages
of each. The colour distribution of matched regions often differs slightly between cameras, creating
colour-wise uneven panoramas. The fourth section, Colour Balancing, provides a review of methods to
solve this issue and create seamless panoramas. Finally, the whole process misses the point if the por-
trayed colours do not correspond to what is perceived under standard lighting conditions, like sunlight,
for example. The fifth section, Colour Correction, details the theory behind the current state-of-the-art
method for colour correction. For a complete version of the document, the reading of this chapter should
be followed by ??, in Appendix ??.
2.1 Stone Scanning Machines
At first thought, a simple solution comes to mind which is to simply take a picture of the full slab. This is
impractical because the slabs may have dimensions up to 2 x 3 meters, making it necessary to distance
the camera from the slab, leading to a very large structure in an environment where it is difficult to
control the lighting conditions. This would lead to a decrease in resolution, colour adulteration and
reflexes, which are not desired. To solve this issue, the current solutions take images closer to the
slab and stitch the outputs to create an image of the full slab. Currently there are four solutions on the
market. The Bstone Scaner [7] by Bstone , Taglio Scanner [8] by Taglio, MapaScan [9] by MapaStone
9
and Iris StoneScan by D2 Technologies [10]. The Bstone is a portable device, capable of scanning slabs
up to 500x600 mm, manually, outside off the production line. The Taglio Scanner and MapaScan are
very similar solutions, to be implemented on the production line and relying on a single high resolution
camera to perform the acquisition. The StoneScan differs from the latter two by the usage of two high-
end cameras instead of just one.
2.2 Feature Detection and Matching
Computer vision started being implemented in the industry around the 1990s and was predicted to rev-
olutionise the manufacturing processes and integrate controllable processes, [11]. Nowadays, visual
sensors, i.e. cameras, are being widely used for surveillance, creating maps and panoramas, as well
as in industry, to control processes and help with quality checks. This is achieved by extracting and
processing the data acquired. In the case where there are multiple sensors, this information must be
matched with the data coming from other sensors so that the sensed object or scene can be fully charac-
terised. This is done by detecting features in one image, describing them with a specific set of metrics,
and comparing these metrics with the ones found in the data acquired by another sensor. Features
from different sensors with similar metrics are potential matches. Once the features are successfully
matched, it is possible to compute a transformation matrix which relates the two images. The key for a
good match is to choose the most appropriate similarity measures, or metrics.
2.2.1 Feature detection
The first step is to detect interest points in an image I. An interest point is a point which could be
used for 2D matching, usually associated with brightness discontinuities. This leads to the aperture
problem. The aperture problem, Figure 2.1 occurs when observing a moving scene through a window
which is not big enough to unambiguously estimate its motion, this was firstly mentioned by Horn and
Schunk [12] and studied more intensively by Anandan [13]. This happens because motion estimation
requires references that move in both x and y axis. Motion can only be estimated over the normal of a
feature’s borders, hence the motivation for using corners or blobs as interest points. The most common
methods for detecting interest points analyse the derivatives of I. Edges correspond to regions with a
high derivative in only one direction, corners would have high derivative values on both directions and
blobs correspond to regions of low derivatives delimited by an edge. Canny [14] proposed three criteria
which should be satisfied by an edge detector and may be applied to any interest point detector:
• Good detection - The detector should only occasionally incorrectly assign edge pixels, either by
failing to mark true edge points or by incorrectly marking non-edge points.
• Good localisation - Points marked by the detector as edge points should be as close as possible
10
to the centre of the true edge
• Single response - The detector should only produce a single response to a given edge.
A B C
Figure 2.1: Classical example of the aperture problem. The striped patterns have different motion
directions. However, the apparent motion direction is the same when seen through the circular
path.
Gaussian filtering is widely used to filter out noisy data as well as selecting specific frequencies. This
served as the base for the first approaches on automatic scale detection. An image I is filtered by
performing a convolution with a gaussian filter g, Equations 2.1 and 2.2.
g(x, y, σ) =1
2πσ2e−x2 + y2
2σ2 (2.1)
Iσ = I(x, y) ∗ g(x, y, σ) =
x+3σ∑x−3σ
y+3σ∑y−3σ
I(x, y)g(x, y, σ) (2.2)
Where x and y correspond to pixel coordinates and sigma is the standard deviation of the gaussian
distribution. The filter is convoluted over a length of six sigma on both coordinates since this represents
99,73% of the information, comprising the most significant portion of information. Figure 2.2 shows an
example of an image filtered with gaussian filters of different sizes and standard deviations.
11
Figure 2.2: Image filtered with gaussian filters of different sizes and standard deviations. Larger
values of sigma result in increased blurring of the image. Notice how the size of the filter should
be adjusted according to the standard deviation in order to utilise the most significant portion of
the filter. The last column shows a filter with a size of three sigma instead of six sigma, which
leads to significant loss of filter information.
Edge detectors
The first approaches, Robert [15], Prewitt [16] and Sobel [17] operators, Equations 2.3, 2.4 and 2.5,
consisted in calculating the derivatives of the image by applying a set of discrete differentiation masks
in different directions, Equations 2.6 and 2.7, highlighting the high frequency information. Canny [14]
developed a multi-stage algorithm where the gradients were obtained using the previously mentioned
masks, followed by non-maximum suppression to exclude false detections, thresholding to evaluate
potential edges and eliminate all the low score edges which are not linked to potential strong edges.
MRobert =
−1 0
0 1
(2.3) MPrewitt =
−1 0 1
−1 0 1
−1 0 1
(2.4) MSobel =
−1 0 1
−2 0 2
−1 0 1
(2.5)
Ix = f(x, y) ∗MMethod (2.6)
Iy = f(x, y) ∗MTMethod (2.7)
There is also the method of the Laplacian of the Gaussian (LoG), Equation 2.8. The Laplacian operator is
applied to a previously gaussian filtered image. This method is analogous to the Difference of Gaussians
(DoG), where two images filtered with different strength gaussian filters are subtracted, resulting in a
discrete version of the differentiation operator. Figure 2.3 shows the results of applying the different
12
detectors to a test image.
∇2Iσ =∂2Iσ
∂x2+∂2Iσ
∂y2(2.8)
Figure 2.3: Edges highlighted using the Roberts, Prewitt, Sobel, Canny and Laplacian of Gaus-
sian methods. The results were obtained using the computer vision toolbox from MATLAB.
Corner detectors
Figure 2.3 shows that edge detectors enhance both edges and corners. In a way, edge detectors
highlight all discontinuities, including corners. Corner detectors take the result of an edge enhancing
method and apply a scoring procedure to evaluate the presence of a corner feature. State-of-the-
art detectors include the Harris and the Shi-Tomasi operators, which analyse the structure tensor of
a previously gaussian filtered image, derived from its gradients, Equation 2.9, alternatively called the
second moment matrix. The difference between the methods lies on the coefficients used as metrics
for the detection. Harris [18] implements the corner score with Equation 2.10, where k is a tunable
sensitivity factor. This avoids the computation of eigenvalues, which is computationally more expensive.
Shi-Tomasi proposes calculating the eigenvalues of the matrix and taking the minimum value as the
score [19], Equation 2.11. Both methods select the highest scores by applying a threshold to the score.
S =
I2x IxIy
IxIy I2y
(2.9)
dH = det(S)− k trace2(S) (2.10)
dST = min(λ1, λ2) (2.11)
Where λi corresponds to the ith eigenvalue of S and Ix and Iy are the derivatives of the image over axis
x and y, which can be determined using one of the methods of image differentiation presented in the
previous sub-section.
Figure 2.4: Corners detected using the Harris and the Shi-Tomasi methods, from left to right. The
results were obtained using the computer vision toolbox from MATLAB.
13
Blob detectors
Blobs correspond to areas without brightness discontinuities. However, a blob may be described using
its centre of mass, making it an interest point. Lindeberg [20] experimented using the determinant of
the Hessian, Equation 2.12, or the Laplacian, corresponding to the trace of the Hessian. Blobs were
detected by searching for the maximum of the normalized Laplacian of Gaussian (LoG) in scale-space,
where the scale corresponds to the amount of filtering applied to the image. The Laplacian of Gaussian
is normalized with σ2, as seen in Equation 2.13. Lowe approximates the Laplacian with the Difference
of Gaussians (DoG) and searches for local extrema of the scale-space. Matas et al [21] developed
the Maximally Stable Extremal Regions (MSER) which analyses a grey scale image to find connected
regions of similar pixel intensities, where the regions are surrounded by pixels with either higher or lower
intensity than all the pixels contained in the stable region. Figure 2.5 shows an example of the application
of the different blob detection methods.
H(f(q)) =∂2f(q)
∂qiqj=
∂2f
∂x2∂2f
∂x∂y∂2f
∂y∂x
∂2f
∂y2
(2.12)
LoGNormalized = σ2 ∗ LoG(x, y) =1
πσ2
(x2 + y2
2σ2− 1)e−x2 + y2
2σ2 (2.13)
DoG(x, y) = Iσ − Iσ∗
(2.14)
Figure 2.5: Blobs detected using different detection methods. The results were obtained using
the computer vision toolbox from MATLAB.
Scale-Space Theory and Pyramids
The scale-space was developed from the necessity to detect features at different scales. The most com-
mon scale-space is the Gaussian scale-space, which is generated by consecutively filtering an image
with increasingly strong filters to progressively average out the highest frequencies in the image. Linde-
berg [20] introduced the notion of automatic scale selection, using the LoG to generate the scale space
levels. To make this process more computationally efficient, Burt and Adelson propose to sub-sample
the blurred images by an octave every octave step on the filter’s standard deviation, thus creating an
image pyramid [22]. Both the LoG and DoG result in a pyramid where each level is a frequency band,
where features from each scale can be detected. Gaussian filtering corresponds to a low-pass filter
14
and the derivatives and subtraction of Gaussians corresponds to the continuous-time and discrete-time
versions of a high-pass filter, hence creating the referred band-pass filter. Figure 2.6 shows an example
of a Gaussian and a Laplacian Pyramid built with Burt and Adelson’s method. Anandan [13] proposes
a coarse to fine search by decomposing an image into frequency bands using Burt and Adelson’s algo-
rithm, searching for correspondences in a coarse scale, projecting into a finer scale and searching on
the neighbourhood of uncertainty of the projection.
Figure 2.6: Example of Gaussian and Laplacian pyramids built using the Burt and Adelson’s
method.
2.2.2 Feature Description
The second step is to extract and describe the detected features. The simplest possible metric is to use
pixel intensity to describe features. These measures can be used accurately when both frames differ
from each other only by a translation vector u [23].
However, more complex problems involve camera rotations, scale, point of view and luminance changes,
motivating the development of invariant descriptors. A good example is using the colour distribution,
or histogram. This measure describes the window of search, not the pixels, meaning that the same
feature will have the same description even if it is rotated. However, the histogram will not be the
same under scale or point of view changes. To address this issues, more advanced descriptors were
created such as the Scale Invariant Feature Transform (SIFT) [24], Gradient Location and Orientation
Histogram (GLOH) [25], Shape Context [26] and Speeded Up Robust Features (SURF) [27], among
others. The SIFT method uses the scale-space theory to detect interest points at different scales. After
detecting a potential interest point at a defined scale, a 16x16 patch around the point is extracted and
normalized using a Gaussian filter where the standard deviation depends on the detection scale. The
patch gradients are computed using finite differences and are grouped over 4x4 windows, quantizing the
15
information into 8 orientations. This results in descriptors of dimension 128. Figures 2.7(a) and 2.7(b)
show the computation of gradient field and the histogram over 4x4 patches using a simplified feature
example.
(a) (b)
Figure 2.7: Graphical explanation of the theory behind the SIFT descriptor. (a) Gradient magni-
tude and direction and (b) Histogram of 4x4 windows of the extracted feature.
Differences in colour gains, saturation and contrast affect magnitude but not orientation. Therefore, this
descriptor is robust to different lighting conditions. Moreover, the histograms can be rotated over the
maximum magnitude to provide some robustness to rotation changes. Gradient directions are quantized
inπ
4intervals, which means the descriptor is robust to rotations up to approximately 45 degrees.
GLOH is an extension of the SIFT descriptor, designed to increase its robustness and efficiency. The
SIFT descriptor is calculated for a log-polar location grid, resulting in 17 location bins with gradient
orientation quantized in 16 bins. Thus generating a descriptor of dimension 272, which is reduced using
Principal Component Analysis (PCA) to 128 elements. Shape Context is a descriptor similar to the
SIFT but using edge information instead of gradient information. The edges are detected using Canny’s
method [14]. The edge locations are described in a log-polar coordinate system and quantized in 9 bins
and edge orientation in 4 bins, leading to a descriptor of size 36.
2.2.3 Feature matching
The third and final step is to match the features found in different images. The most common method is
to take the L2 norm between the descriptors, Equation 2.15.
‖X,Y ‖2 =√
(X − Y ) · (X − Y ) =√
(x1 − y1)2 + (x2 − y2)2 + (x... − y...)2 + (xN − yN )2 (2.15)
The matches are used to compute a transformation matrix, also known as an homography matrix, which
16
relates the two frames and consists in a rotation and translation matrix stacked together, Equation 2.16.
The homography is computed using the location of a feature in both images, hence the need to find
matching features in both images. This is usually done by taking the Euclidean distance between de-
scriptors, where the best match corresponds to the pair with lowest distance. Lowe [24] states that
this method is not robust enough and that it would be useful to have a way of discarding features that
do not have any good match from the database, proposing an additional condition. If the conditiondsecond smaller distance
dsmaller distance≥ 1.5 is true, the features are matched. This eliminates features that do not have
any good match, or features which are not unique and had multiple good matches, making them im-
proper for calculating the homography matrix.
H =
h1 h2 h3
h4 h5 h6
h7 h8 h9
, R =
h1 h2
h4 h5
, T =
h3h6
(2.16)
Where hi are elements of the homography matrix. The R and T matrices are the rotation and translation
matrices implicit in the homography matrix. The last row[h7 h8 h9
]corresponds to additional scaling
terms in the homogeneous coordinate space. For an affine homography, this row is set to[0 0 1
].
Estimating an homography
The homography matrix maps features from one image to another. There are various methods for
computing the homography. Most often the method used is the Direct Linear Transformation (DLT),
[28]. Consider a pair of matched features, one in the left image and one in the right image with the
homogeneous coordinates x ={uleft vleft 1
}Tand x =′
{uright vright 1
}T. Since the coordinate
space is homogeneous, the relation between these points can be written as:
x′w = Hx (2.17)
The relation between these can be computed as follows:urightw
vrigtw
w
=
h1 h2 h3
h4 h5 h6
h7 h8 h9
uleft
vleft
1
(2.18)
Re-writing in to a system of equations leads to:
w′ = h7uleft + h8vleft + h9
uright =h1uleft + h2vleft + h3h7uleft + h8vleft + h9
vright =h4uleft + h5vleft + h6H7uleft + h8vleft + h9
Using h =[h1 h2 h3 h4 h5 h6 h7 h8 h9
]T,
B =
uleft vleft 1 0 0 0 −urightuleft −urightvleft −uright0 0 0 uleft vleft 1 −urightuleft −urightvleft −uright
(2.19)
17
Such that
Bh = 0 (2.20)
If the homography is normalized, i.e. H33 is 1, the problem has 8 variables, hence needs 4 pairs of
matched features in order to estimate h. The problem is solved by stacking matrices B resultant from
different pairs in a matrix A and using singular value decomposition to get a sum of squared differences
optimum solution. This is equivalent to taking the eigenvector corresponding to the smallest eigenvalue
of matrix ATA. With the homography matrix, it is possible to project points from the left image to the
right and assess the error that this projection poses against the real position of the feature. Although
thresholding eliminates most of the unfit matches, false matches are still a possibility and will introduce
an error in the homography estimate and could now be considered outliers from the positive match
data set. This motivated the usage of a non-linear state-of-the-art outlier rejection procedures like the
Random Sample Consensus (RANSAC) [29].
RANSAC
The RANSAC works by taking random samples using the necessary points to fit the function it is trying to
estimate and enlarging the set with data that produces coherent results, [29]. Applied to the homography
estimation, the algorithm would take 4 random pairs of matched features, compute the homography
matrix using a method like the DLT, and find the pairs which produce a coherent result through a number
of iterations. Knowing the probability of picking up a false match from the set, it is possible to calculate
the number of iterations k necessary to get a certain level of confidence z that at least one error free
selection of points was made, i.e. the algorithm succeeds, using Equation 2.21.
k =log(1− z)log(1− wn)
(2.21)
Where n is the number of points needed for fitting the model. In the case of estimating the homography,
every pair of points is associated with a probability of a good match, hence 4 pairs are needed leading
to n = 4. As an example, if the probability of picking a true match is 80%, and the required probability
of success is 99.9%, then the number of iterations needed is calculated plugging in the variables in
Equation 2.21, k =log(1− 0.999)
log(1− 0.84)= 14.
2.3 Image Fusion
Image fusion refers to the process of transitioning between two overlapping images. The classical
methods consist in applying a membership function over a fixed region of the overlapping regions. This
function spans from 0 to 1 and can be linear or non-linear. Figures 2.8(a) and 2.8(b) show examples of
applying a sharp transition and using a transition based on a Gaussian Cumulative Distribution Function
(GCDF), function of the standard deviation.
18
(a) (b)
Figure 2.8: Plots showing different feathering window types. (a) Sharp transition. (b) Fixed window
feathering
Using a GCDF is preferred over using a linear interpolation as it allows to choose the ratio at which
the images are merged. Merging two images could result in two artefacts, ghosts and seams. A ghost
appears when, due to a misalignment of the overlapping images, a faded version of the misaligned
feature appears. A seam occurs mostly when using a sharp transition, due to slight colour differences or
misalignment between images, a visible transition shows on the final output. Assuming Figures 2.9(a)
and 2.9(b) correspond to overlapping regions of two images, Figures 2.10(a) and 2.10(b) show the
result of applying a sharp transition or GCDF transition. As mentioned before, the presence of a seam
is noticeable in Figure 2.10(a), where the transition from one image to another is clearly visible. In
addition, ghosts are present in Figure 2.10(b), where features of one image, (red stripes), are visible due
to a misalignment both in position and size. The images used as examples are intentionally misaligned
and with a different colour distribution to clearly expose each method’s advantages and disadvantages.
(a) (b)
Figure 2.9: Images representing overlapping regions. (a) Left camera and (b) right camera.
19
(a) (b)
Figure 2.10: Results of merging the images. (a) sharp transition and (b) a wider window using a
CDF function
These misalignments occur in the process of acquisition using multiple cameras due to errors in the
assembly process or hardware differences. To avoid seams, the area over which the mosaics should
be interpolated should be equal to the largest feature in the image. Moreover, to avoid ghosts, the
interpolation window should be smaller than twice the size of the smallest feature in the image. In the
sample images, the largest feature corresponds to the area bellow the red stripes and the smallest
feature corresponds to the red stripes. Concluding, it is impossible to achieve the optimum feathering
window. This is the case for most images and imaging applications. To address this issue, Burt and
Adelson propose a method named multi-resolution spline. The term image splining is used to refer
to the procedure of merging two images avoiding seams. "A good image spline will make the seam
perfectly smooth, yet will preserve as much of the original image information as possible.". [30]. The
proposal is to decompose the image in frequency bands and join each band with increasingly large
feathering windows. Thus, the high frequency features will be blended using sharper weights while lower
frequency features are merged over a wider window. The decomposition is done by building Laplacian
Pyramids for both images as well as a Gaussian Pyramid for the weighting function. Figure 2.11 shows
a simplified example of the evolution of the weighing function over the pyramid levels.
20
Figure 2.11: Graphical representation of the feathering windows used by the different methods for
image blending.
The final step consists in reconstructing the splined image, which is done by summing each level of the
combined pyramid.
Summarising, the steps to achieve multi-resolution splining consist of:
1. Build Laplacian pyramids for image A and B denoted by LA and LB.
2. Build a Gaussian pyramid for the weighing function, GM . The weighing function can be converted
to an image by attributing the function value to a pixel.
3. Combine the pyramid levels by doing LSlevel = LAlevel ∗GM level + LBlevel ∗ (1−GM level).
4. Obtain the splined image by expanding and summing the levels of the pyramid.
The theory behind Gaussian and Laplacian pyramids is explained in detail in Section 2.2.1. Figure 2.12
shows the result of using this method to merge Figures 2.9(a) and 2.9(b)
21
Figure 2.12: Image resulting from splining the two sample images using the multi-resolution
method proposed by Burt and Adelson.
This method is not perfect, one artefact can be seen in the space between the middle red stripes.
However, there is a significant improvement over the results shown in figure 2.10(a) and 2.10(b), as the
interpolation function is chosen according to the frequency content of the images to spline.
2.4 Colour Balancing
Colour balancing is the process of balancing the colours between frames on a stitched mosaic. Colours
may differ due to different exposure levels, colour gains or view point changes. These techniques focus
on transforming the colours from a source image to a target value and can be divided in parametric
and non-parametric. Parametric approaches assume the colours can be transformed linearly using a
3x3 transformation matrix M such that Is ∗M = It, where Is and It are the source and target images.
Early approaches include the brightness compensation, where M is diagonal and with equal diagonal
values. The transformation matrix M is found using the colour information of two overlapping areas.
The simplest model is the one where M is diagonal. It assumes that colour channels are independent.
M takes the form of Equation 2.22, where α =mean(R2)
mean(R1), and β and γ are found similarly, using the
intensity values from the green and blue channels. The advantage of this model is that it does not need
two strictly overlapping areas, since the area is being averaged.
Mdiagonal =
α
β
γ
(2.22)
Dependence between colour channels can be added, resulting in a linear model, Equation 2.23, where
M can be estimated using two overlapping regions I1 and I2 by doing M = (IT1 I1)−1IT1 I2, where I is an
22
(n,3) matrix, where n is the number of pixels in the images.
M linear =
a b c
d e f
g h i
(2.23)
These models can be extended to an affine transformation by adding an offset, resulting in an improve-
ment of the mapping accuracy. In this case, the matrix containing pixel intensity information should be
extended, taking the form of Ii =[ri gi bi 1
]. Equations 2.24 and 2.25, show the extended forms
of the diagonal and linear model.
Mdiagonal−affine =
α
β
γ
α1 β1 γ1
(2.24)
M linear−affine =
a b c
d e f
g h i
a1 e1 i1
(2.25)
Where the offset can be found by
a1
e1
i1
T
=
mean(R2)
mean(G2)
mean(B2)
T
−
mean(R2)
mean(G2)
mean(B2)
T
·
a b c
d e f
g h i
(2.26)
Although the diagonal model with affine transformation and the linear model with and without affine
transformation provide more accurate mappings, they require that the gain estimation is made using
exact pixel correspondence. In problems involving different cameras and points of views, it is rare that
an exact correspondence is found. Tian et al [31] proposes a solution for this problem. Consider two
pictures I1 and I2, firstly the maximum overlapping area is found. Taking the histograms of the two
regions, region 1 is transformed so that it matches the histogram in image 2. Performing this transfor-
mation results in a transformed region 1 with direct pixel correspondences with the original image 1 and
allows the application of any of the previously mentioned models to estimate the Matrix M .
Non-parametric approaches rely on feature detection and balance the colours locally by finding a relation
between the matched features and applying the correction to neighbouring regions. Yamamoto et al
[32] proposes a method where SIFT features are extracted and the matches are used to generate a
look-up table using an energy minimization approach. These methods are not commonly used as the
results often do not compensate the increased complexity and computational load associated with their
implementation. Different methods are suitable for different applications. Xu,W and Mulligan, J [33]
23
showed that parametric methods, despite being less complex than non-parametric, yield stable and
effective results while being computationally faster than non-parametric approaches.
2.5 Colour Correction
Similarly to the previous section, colour correction is a form of colour warping. Colour correction is pre-
sented in a different section with the purpose of differentiating its use. While colour balancing focuses on
balancing the colours between adjacent mosaics to avoid seams, colour correction focuses on mapping
the overall colour of the resultant image to the intended Red, Green and Blue (RGB) values. These
values are commonly taken from a colour checker, consisting in a set of coloured patches with known
RGB values. The colour checker used was the SG X-Rite, shown in Figure 2.13.
Figure 2.13: Model of the colour checker used in the project.
The relation between source and target colour values is highly non-linear, hence the methods presented
in the colour balancing section are not suitable. Menesatti et al [34], propose a non-linear mapping using
a Radial Basis Function (RBF) with a thin plate spline weighing function. The following section explains
the theory behind RBF warping. The theory is presented considering the application to the RGB colour
space.
Radial Basis Function
Given a data set X = {xi}Ni=1 ⊂ IR3 and correspondent function values {fi}Ni=1 ⊂ IR, find the interpolant
s : IR3 → IR such that
s(xi) = fi, i = 1, ..., N. (2.27)
where x = (r, g, b) are data points from the colour space, with r, g and b being the red, green and blue
intensity values and fi = (r∗, g∗, b∗) is the correspondent correct colour. The interpolant is chosen from
the Beppo-Levi space of distributions on IR3 with square integrable second derivatives, which contains
a set:
S = {s ∈ BL(2)(IR3) : s(xi) = fi, i = 1, ..., N} (2.28)
24
of solutions for the problem. Taking the rotation invariant semi-norm inherent to this space,
‖s‖2 =
∫IR3
(∂2s(x)
∂r2
)2
+
(∂2s(x)
∂g2
)2
+
(∂2s(x)
∂b2
)2
+
2
(∂2s(x)
∂r∂g
)2
+ 2
(∂2s(x)
∂r∂b
)2
+ 2
(∂2s(x)
∂g∂b
)2
dx
(2.29)
as a measure of energy or smoothness, the functions with the lowest energy, i.e.,
s∗ = argmin ‖s‖ , s ∈ S (2.30)
are the smoothest and proven by Duchon [35] to have the form of:
s(x) = p(x) +
N∑i=1
λi(‖x− xi‖) (2.31)
where p(x) is a linear polynomial, λi are coefficients with real values and ‖·‖ is the Euclidean norm. This
is a particular example a RBF, where the data points xi are called centres of the function. A general
formulation of a RBF is
s(x) = p(x) +
N∑i=1
λiΦ(‖x− xi‖) (2.32)
where p(x) is a low degree polynomial and Φ(x) is a form function which is chosen according to the
problem. For fitting functions of three variables, the case of this problem, the bi-harmonic (Φ(r) = r, the
case of Eq. 2.31) and tri-harmonic (Φ(r) = r3) are the advised choices. In the specific case of this 3D
problem, the interpolant s(x) is defined by a polynomial of the form p(x) = c0 + c1r + c2g + c3b, being
the variables (r, g, b) correspondent to the red, green and blue colour channels, and the coefficients λi.
To ensure the interpolant is contained in the Beppo-Levi space of distributions on IR3, the coefficients λi
are required to fulfil the orthogonality conditions:
N∑i=1
λi =
N∑i=1
λiri =
N∑i=1
λigi =
N∑i=1
λibi = 0 (2.33)
The interpolation and orthogonality conditions may be combined in a linear system to solve for the
coefficients which define the RBF. Thus Equation 2.32 and 2.33 may be written as A P
PT 0
(λc
)= B
(λ
c
)=
(f
0
)(2.34)
where
Ai,j = Φ(‖xi − xj‖), i, j = 1, ...N (2.35)
P =
1 r1 g1 b1
1 r2 g2 b2...
......
...
1 rN gN bN
, λ ={λ1 · · · λN
}T, c =
{c0 c1 c2 c3
}T(2.36)
RBFs have the particularity of having an associated linear system which is always invertible, hence the
solution can be found by (λ
c
)= B−1
(f
0
)(2.37)
25
The function value is in fact {fi}Ni=1 ⊂ IR3, which results in a RBF being fitted to each colour channel.
The coefficients found are then (λ
c
)=
[(λ
c
)r
(λ
c
)g
(λ
c
)b
](2.38)
A set of data Y = {yi}Mi=1 ⊂ IR3, where {yi} ={r g b
}, with r, b and g being the measured pixel
intensity values for the colours red, green and blue, can now be mapped into the real values using the
RBF coefficients calculated with the calibration data set X. This can be done by using Equation 2.34
and plugging the data set in the equations. Resulting in:
Ai,j = Φ(‖yi − xj‖), i = 1, ...,M j = 1, ...N (2.39)
P =
1 y1
1 y2...
...
1 yM
=
1 r1 g1 b1
1 r2 g2 b2...
......
...
1 rM gM bM
(2.40)
resulting in the linear operation
(f
0
)=
r1 g1 b1
r2 g2 b2...
......
rM gM bM
=
A P
PT 0
[(λc
)r
(λ
c
)g
(λ
c
)b
](2.41)
Where f corresponds to corrected pixel RGB values. Radial basis functions are calculated for a calibra-
tion data set and later used to correct any data set acquired under the same calibration conditions. The
computation effort grows with the size of the calibration set, making the choice of said set important.
26
Chapter 6
Results
This chapter presents results from the implemented methods as well as the results of alternative meth-
ods which were taken as validation for the procedures used. The first section, Single Image Recon-
struction presents the results from the modular phase processing. The second section, Feature Match-
ing presents matching results obtained using several state-of-the-art methods and provides a compari-
son with a ground truth reference made manually and the method proposed in the document. The third
section, Image Fusion presents results of different image blending methods, supporting the method
proposed for this project. Finally, the fourth section, Computation Time and Efficiency provides a re-
view of the efforts taken to maximize efficiency of the processes and harvest the full processing potential
of the system.
6.1 Single Image Reconstruction
The solution implemented for video reconstruction relies on the correct measurement of the conveyor
belt’s velocity and that the calibration process is done correctly. It is arguable whether a more au-
tonomous stitching procedure could be implemented. To test this possibility, an algorithm similar to the
fine search presented in section ?? was used, assuming that the slab will only have a vertical translation.
The results show a classical example of the aperture problem, presented in section 2.2.1. In areas where
there are features like corners, the displacement vector was estimated correctly. However, in areas like
the ruler, features were not sufficient to estimate vertical translation, resulting in the shortening of the
ruler height, visible in Figure 6.1(a). Note that the images were rotated for a better space usage. The
images shown in the figure were reconstructed from left to right, corresponding to the rotated vertical
axis.
27
(a)
(b)
Figure 6.1: Image reconstruction using different approaches. (a) Stitching by search. (b) Stitching
using the system’s parameters. The images were reconstructed from left to right.
In the examples of the full reconstruction, the regions highlighted in red represent zones where the
automatic search is prone to fail due to insufficient features. Area 1 is visibly distorted in Figure 6.2(a)
and area 2, although harder to notice, is expanded by approximately 100 pixels, which is translated
to roughly 10 mm. The error resulting from the search method would break the process since the
reconstructed images would miss information or have information that would not match with the other
cameras.
(a)
(b)
Figure 6.2: Close ups on figure 6.1. (a) Ruler in image 6.1(a). (b) Ruler in image 6.1(b).
28
6.2 Feature Matching
The algorithm implemented in this project for feature matching was tested against the SIFT method, us-
ing the VLFeat toolbox [36] and the SURF, MSER and Harris implementations of the MATLAB R© Com-
puter Vision Toolbox [37]. Table 6.1 provides a comparison between methods, supporting the method
proposed in this project. Additionally, one matching was made manually, to be taken as the ground truth,
i.e. the "perfect" match. Using the different methods, matrices P were generated, as described in Equa-
tion ?? which were used as measures for comparison between methods. The row relative to vertical
alignment contains cumulative information which must be eliminated to leave the relative displacement
between consecutive frames. After eliminating this information, P can be compared with the reference
P ideal taking the absolute of the difference, Equation 6.1.
AD(P ideal, Pmethod) = |P ideal − Pmethod| (6.1)
Figure 6.3: Matches found with the SIFT algorithm. False matches highlighted in red.
Local methods rely on feature extraction and matching leading to a translation vector for each pair of
features matched. Figure 6.3 shows an example of two overlapping images matched using the SIFT
method. Although the methods try to maximize true positives, the results still contain false positives
which were excluded using a RANSAC routine, explained in Section 2.2.3. Block Matching with Initial
Estimation (BMIE) and SIFT performed similarly and have the best results. The other methods had
similar results for the vertical alignment but performed poorly in the horizontal alignment. This perfor-
mance can be explained by the presence of very similar patterns and small overlapping areas which
could lead to too many false positive matches and insufficient positive matches to correctly estimate the
displacement. The matching results are presented in Figure 6.4. Concerning the run time performance,
the BMIE method returns results 3.67 times faster than the SIFT method. Consequently, the method
proposed for matching the images was the one which performs the best. Table 6.2 displays the slab
width measured for the reference, BMIE and SIFT matches. The physical measurement of the slab’s
width was 600 mm. This confirms that the matching made manually yields the most approximate final
result, and that the two automatic methods return results which are very close to the real one.
29
Method vertical error (px) σ horizontal error (px) σ Run Time (s)
BMIE 0.5 0.54 2.17 2.32 1.55
SIFT 0.6 1.03 2.17 2.14 5.7
SURF 1.7 1.21 76.17 59.35 1.4
MSER 0.6 0.52 108.50 90.61 3
Harris 1.5 1.37 77.67 60.60 1.4
Table 6.1: Matching methods comparison. Relative to the reference match, made by hand.
(a) By hand (b) BMIE (c) SIFT
(d) SURF (e) Harris (f) MSER
Figure 6.4: Results from different matching methods.
Method Slab width (mm)
Ground Truth 600.7
BMIE 602.7
SIFT 596.8
Table 6.2: Final output dimensions
30
6.3 Image Fusion
Panoramas were obtained by merging image contributions using two naive solutions: no feathering and
fixed window feathering. These were compared to the implemented multi-resolution feathering to assess
its performance. The latter is fundamentally a mix between the first two, applying smaller feathering
windows to higher frequencies and a wider one to lower frequencies. Graphical representations of
the weights to use for each method are presented in Section ??, Figure 2.8 and Figure 2.11, for the
multi-resolution, GCDF and sharp feathering, respectively. To understand the benefits of using multi-
resolution blending, Figures 6.5 and 6.6 show examples of images containing high and low frequency
content, blended using different techniques. Analysing the high frequency results in Figure 6.5, the multi-
resolution blending performs similarly to the no-feathering solution, avoiding the ghosted area created
by the wider window feathering solution. Looking at the low frequency results in Figure 6.6, the multi-
resolution blending performs similarly to the wide fixed window feathering solution, avoiding the visible
seam created by the no-feathering solution.
(a) No feathering. (b) Fixed window feathering. (c) Multi-resolution feathering.
Figure 6.5: Images containing high frequency information blended using the tested methods.
(a) No feathering. (b) Fixed window feathering. (c) Multi-resolution feathering.
Figure 6.6: Images containing low frequency information blended using the tested methods.
31
6.4 Computation Time and Efficiency
In a first approach, all the procedures were made in a single computer, running a MATLAB R© instance.
This setup was not optimal since all procedures before the feature matching can be done separately and
simultaneously, using the controller connected to each camera. Therefore, the image reconstruction,
affine transformation, lens distortion correction and resolution homogenization were implemented in
Python, to be executed in the controller modules. Doing so arises a second problem: the reconstruction
of a high definition video into an image requires storing the reconstruction in Random Access Memory
(RAM) memory, in un uncompressed form, while in the processing phase, limiting the size of the image
that can be created. Making some simple operations, the maximum height of the image created can be
calculated, knowing that:
1. 800 MB of RAM available.
2. Image converted from 8-bit unsigned integer to double precision floating point format, for process-
ing, taking 24 bytes per pixel.
3. Fixed image width of 1920 pixels.
4. Approximate camera resolution of 13.5px
mm, at 18 cm from the object.
Then,
Hmax =800 · 106
1920 · 24 · 13.5= 1286 mm = 1.286 m (6.2)
Which means that the memory is depleted before even fulfilling the project’s requirements. The proce-
dure implemented to solve this issue resizes the size of the image as it is being reconstructed, reducing
the amount of memory necessary to allocate the reconstruction by the square of the image sub-sampling
factor. Transferring the processing to the controllers allows to scale the number of cameras without in-
creasing the time of computation prior to image matching.
In addition, the size of the image was already a limitation during the first approach where all processing
was made in a single computer since the scan of a 60cm by 60 cm slab, resulted in an image with 92.5
Million pixels, occupying 2220 MB in RAM memory, making the computations extremely slow. This issue
would only aggravate for slabs with larger dimensions. Concluding, it would always be necessary to
reduce the image’s sizes before the feature matching procedure, independently on where the process-
ing would take part. The resize procedure is also helpful for storing the final image. For example, in a
factory with a continuous production, storing full resolution images would result in a rapid depletion of
the available storage. Although sub-sampling does lead to a decrease in resolution, which can be seen
as a disadvantage, this is not equivalent to acquiring the images at a lower resolution. Acquiring the
images with a lower resolution setting would mean that some features would not have been sufficiently
described, while down-scaling from high resolution interpolates the well described features reducing
the number of pixels while retaining as much information as possible. In summary, resizing the images
makes all operations faster and reduces the RAM memory needed to carry out the procedures as well
as the memory needed to store the final output, retaining as much information as possible. There is a
32
trade-off between algorithm acceleration and acceptable final resolution which must be chosen so that
both lie in an acceptable range. The final output should be ready before the next stone is scanned, and
should have enough detail and quality to show all its characteristics to the client. Therefore, this setting
may vary among different factories, since image quality has a part of subjectivity and the time interval
between slabs in the production lines may vary.
33
Chapter 7
Conclusions
This document proposes an innovative system for scanning stone slabs. The system consists of an array
of cameras and respective controllers, and a Personal Computer (PC). This was achieved by creating
a network to enable device communication and capture synchronization. The system allows to scan
slabs up to 5 meters in length and since it was built for modularity, can be adapted to fit any conveyor
belt. It was inspired by the existing stone scanner Iris StoneScan, by D2 Technology in partnership with
Frontwave, S.A., and its development and design were driven towards an increase in resolution and
decrease in price and volume. Indeed, the resolution achieved was of 14.9pixel
mm, showing a nearly
ten times increase compared to the 1.55pixel
mm, achieved by the current version. The price is around
880e
m, hence the cost of a 2.4 m array would be 2112 e, representing 30% savings in the imaging
equipment employed in the Iris StoneScan. Regarding the size, the cameras stand at approximately
20 cm to the slab, while the Iris StoneScan has its cameras placed at about 90 cm from the slab,
representing a reduction of about 80%. The processing time from the end of acquisition to the final
output is around 10 seconds, 5 of which are in the acquisition modules, hence there is availability for
a new slab to be scanned every five seconds. This is sufficient for a regular stone processing factory.
These achievements were made at the cost of the assumption that the slab’s velocity is constant. This
assumption would be more easily satisfied if the cameras translated over the immobile slab, and not the
other way around. This is because the reduced weight and inertia allow for an easier control over the
motion of the cameras. One of the versions of the Iris StoneScan works in the vertical position, where
the slab is placed in a support while the cameras move over it at a constant velocity, hence the solution
proposed would have its best results when employed in this kind of system.
7.1 Future Work
This project comprised a scientific and an industrial mix, hence the future work can be divided into scien-
tific or industrial main interest, although both are related. The first and second sub-sections, Automatic
34
Calibration and Real-time validation have immediate industrial value as the calibration process would
be easier and more accurate and the real-time validation would avoid any unsuccessful scan from en-
tering the system, eliminating the effort of tracking and removing an improper image from the company’s
database. The third sub-section, Implementing GANS for up-scaling, would have a scientific interest
since it exploits a relatively new method which is not fully developed and studied.
7.1.1 Automatic Calibration
As stated in Section ??, proper camera alignment is crucial for the quality of the output. The alignment
process would benefit both in time and accuracy if done automatically, similarly to the colour gains
calibration presented in Section ??. This could be done by following the proposed steps:
1. Develop and implement detection techniques to extract interest points from the calibration ruler.
The interest points correspond to the horizontal line, and the spaced marks.
2. Analyse the information given by the interest points. The horizontal line provides information on
rotation over the S axis, and the spaced marks provide information on the rotation over A1 and A2
axis as seen in Section ??.
3. Couple engines to control the camera’s rotation over its axis and implement a feedback loop to
drive the current state of the camera to its optimum position.
7.1.2 Real-time validation
The system was designed based on the assumption that there is no slippage between the slab and
the conveyor belt. Developing a solution which reconstructs images even on the event of slab rotation
and translation during image acquisition appears to be computationally very expensive and even if im-
plemented, it would hinder the final output quality. Therefore, including an additional camera dedicated
to detecting rotation or horizontal translation would allow for a validation that the slab did not slip from
the conveyor belt during acquisition. In case of detection of slippage, the process would be interrupted,
protecting the final output from unwanted effects. Since the validation camera captures images which
are not for reconstruction, this camera may capture at a lower resolution, making real-time estimation
possible.
7.1.3 Implementing GANS for up-scaling
Both the existing and the proposed new version of the stone scanning machine produce similar images,
with the difference being that the proposed version produces images with 10 times higher resolution. It
would be interesting to use images from the same stone slabs to train Generative Adversarial Networks
(GANS) to perform image up-scaling. GANS are a branch of artificial intelligence algorithms which
35
implement two adversary neural networks, competing on a zero-sum game. The algorithm works by
having one of the neural networks try to mimic the training set and the other judging the success of the
first. In this case, one of the neural networks would try to recreate an up-scaled image of a stone slab,
and the judging network would decide if it was an image of the up-scaled training set or a try from the
other network. The ideal case is the one where the first neural network produces an output which looks
to the judge as if it was taken from the training set. While its interest is primarily scientific, this method
could be used in the future to generate higher resolution images to showcase the products to potential
costumers in a larger display, for example.
36
References
[1] R. Drath and A. Horch, “Industrie 4.0: Hit or hype? [industry forum],” IEEE Industrial Electronics
Magazine, vol. 8, pp. 56–58, June 2014.
[2] M. J. Sobreiro, “Produção nacional e comércio externo (1992 a 2002),” vol. 1, pp. 173–198, Feb
2002.
[3] E. S. Research, “Produção de rochas ornamentais. análise setorial,” vol. 1, 2014.
[4] J. Carvalho, J. Lisboa, A. Casal Moura, C. Carvalho, L. Sousa, and M. M. Leite, “Evaluation of the
portuguese ornamental stone resources,” vol. 548, pp. 3–9, Feb 2013.
[5] CEVALOR, “Estudo estratégico prospectivo 2004 – 2006,” p. 88, 2006.
[6] F. Technology, “Stonescan.” http://frontwave.pt/technology/project/stonescan/. Accessed:
2017-03-20.
[7] BStone, “Bstone scanner - the world’s first handy stone scanner.” http://www.bstone.com/. Ac-
cessed: 2017-03-20.
[8] T. S. House, “Scanner - marble and stone scanning.” http://www.taglio.it/en/stone/
scanner-2/. Accessed: 2017-03-20.
[9] M. Scan, “Mapascan, the 1st scanner in the world of stone.” http://www.mapastone.com/. Ac-
cessed: 2017-03-20.
[10] D. Technology, “Stonescan iris.” http://www.d2technology.com/visao.html. Accessed: 2017-
03-20.
[11] L. Rossol, “Computer vision in industry,” in Robot Vision, pp. 11–18, Springer Berlin Heidelberg,
1983.
[12] B. K. Horn and B. G. Schunck, “Determining optical flow,” Artificial Intelligence, vol. 17, no. 1,
pp. 185 – 203, 1981.
[13] P. Anandan, “A computational framework and an algorithm for the measurement of visual motion,”
International Journal of Computer Vision, vol. 2, pp. 283–310, Jan 1989.
37
[14] J. Canny, “A computational approach to edge detection,” IEEE Transactions on Pattern Analysis
and Machine Intelligence, vol. PAMI-8, pp. 679–698, Nov 1986.
[15] L. Roberts, Machine Perception of Three-Dimensional Solids. Jan 1963.
[16] J. Prewitt, “Object enhancement and extraction,” pp. 75–149, Feb 1970.
[17] I. Sobel, “An isotropic 3x3 image gradient operator,” Feb 2014.
[18] C. Harris and M. Stephens, “A combined corner and edge detector,” in In Proc. of Fourth Alvey
Vision Conference, pp. 147–151, 1988.
[19] J. Shi and C. Tomasi, “Good features to track,” in IEEE CVPR, pp. 593–600, 1994.
[20] T. Lindeberg, “Feature detection with automatic scale selection,” Int. J. Comput. Vision, vol. 30,
pp. 79–116, Nov. 1998.
[21] J. Matas, O. Chum, M. Urban, and T. Pajdla, “Robust wide-baseline stereo from maximally stable
extremal regions,” Image and Vision Computing, vol. 22, no. 10, pp. 761 – 767, 2004. British
Machine Vision Computing 2002.
[22] P. J. Burt and E. H. Adelson, “The laplacian pyramid as a compact image code,” IEEE
TRANSACTIONS ON COMMUNICATIONS, vol. 31, pp. 532–540, 1983.
[23] R. Szeliski, Computer Vision: Algorithms and Applications. New York, NY, USA: Springer-Verlag
New York, Inc., 1st ed., 2010.
[24] D. G. Lowe, “Distinctive image features from scale-invariant keypoints,” International Journal of
Computer Vision, vol. 60, pp. 91–110, 2004.
[25] K. Mikolajczyk and C. Schmid, “A performance evaluation of local descriptors,” IEEE Transactions
on Pattern Analysis and Machine Intelligence, vol. 27, pp. 1615–1630, Oct 2005.
[26] S. Belongie, J. Malik, and J. Puzicha, “Shape matching and object recognition using shape con-
texts,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 24, pp. 509–522, Apr
2002.
[27] H. Bay, A. Ess, T. Tuytelaars, and L. V. Gool, “Speeded-up robust features (surf),” Computer Vision
and Image Understanding, vol. 110, no. 3, pp. 346 – 359, 2008. Similarity Matching in Computer
Vision and Multimedia.
[28] R. I. Hartley and A. Zisserman, Multiple View Geometry in Computer Vision. Cambridge University
Press, ISBN: 0521540518, second ed., 2004.
[29] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with
applications to image analysis and automated cartography,” Commun. ACM, vol. 24, pp. 381–395,
June 1981.
[30] P. J. Burt and E. H. Adelson, “A multiresolution spline with application to image mosaics,” ACM
Trans. Graph., vol. 2, pp. 217–236, Oct. 1983.
38
[31] G. Y. Tian, D. Gledhill, D. Taylor, and D. Clarke, “Colour correction for panoramic imaging,” in
Proceedings Sixth International Conference on Information Visualisation, pp. 483–488, 2002.
[32] K. Yamamoto and R. Oi, “Color correction for multi-view video using energy minimization of view
networks,” International Journal of Automation and Computing, vol. 5, pp. 234–245, Jul 2008.
[33] W. Xu and J. Mulligan, “Performance evaluation of color correction approaches for automatic multi-
view image and video stitching,” in 2010 IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, pp. 263–270, June 2010.
[34] P. Menesatti, C. Angelini, F. Pallottino, F. Antonucci, J. Aguzzi, and C. Costa, “RGB color calibration
for quantitative image analysis: The “3d thin-plate spline” warping approach,” Sensors, vol. 12,
pp. 7063–7079, May 2012.
[35] J. Duchon, “Splines minimizing rotation-invariant semi-norms in sobolev spaces,” in Constructive
Theory of Functions of Several Variables, pp. 85–100, Springer Berlin Heidelberg, 1977.
[36] A. Vedaldi and B. Fulkerson, “Vlfeat: An open and portable library of computer vision algorithms,”
in Proceedings of the 18th ACM International Conference on Multimedia, MM ’10, (New York, NY,
USA), pp. 1469–1472, ACM, 2010.
[37] “Matlab and computer vision system toolbox,” Release 2017a. The MathWorks, Inc., Natick, Mas-
sachusetts, United States.
39