Fractal Based Video Compression - University of Texas at … · 2007-12-06 · Fractal Based Video...

Preview:

Citation preview

Fractal Based Video Compression Final Year MEng Project Richard Wilding - richwilding@hotmail.com Supervisor Dr S.I.Woolley

April 2002

Abstract This paper first examines fractal compression of still images using two very different types of partitioning. Techniques developed in the first stage of the project are extended to produce a rather unique video stream coder; Delaunay triangulation is used to create a partitioning of the source frames which are suitable for triangular based motion compensation and the motion vectors are extended across multiple frames. The motion compensator tracks small objects well and accurately depicts the motion in even fast moving sequences. In the paper’s conclusion the prototype coder is compared to current commercial codecs.

1 Aims and Objectives.....................................................................................1 2 Introduction ................................................................................................2 3 Still image compression................................................................................3

3.1 Fractal compression ..............................................................................3 3.2 Simple encoder prototype......................................................................4

3.2.1 Partitioning ...................................................................................4 3.2.2 Note on image format....................................................................5 3.2.3 Domain mapping ...........................................................................5 3.2.4 Massic and Geometric Transforms...................................................6 3.2.5 Prototype results ...........................................................................7 3.2.6 Encoding performance improvements..............................................8

3.3 Triangulated partitioned encoding ........................................................10 3.3.1 Optimisations ..............................................................................11 3.3.2 Comparisons and results ..............................................................12

4 Delaunay Triangulation...............................................................................13 4.1 Construction of Delaunay triangulation .................................................14 4.2 Point insertion ....................................................................................15 4.3 Swapping edges .................................................................................16 4.4 Locating a point within the triangulation ...............................................17 4.5 Triangle Orientation ............................................................................19 4.6 Walking Algorithm ..............................................................................20 4.7 Splitting .............................................................................................21 4.8 Merging .............................................................................................22 4.9 Partitioning the image support with Delaunay triangulation. ...................24

4.9.1 Splitting triangles over an image ...................................................25 4.9.2 Merging triangles over an image ...................................................26 4.9.3 Coding method for triangulation....................................................27

4.10 Triangular Regions of data...................................................................28 4.10.1 Barycentric coordinates ................................................................30 4.10.2 Grid mismatch and sampling.........................................................32 4.10.3 Re-sampling................................................................................34

5 Video Compression ....................................................................................35 5.1 Global motion compensation................................................................36

5.1.1 Process.......................................................................................38 5.2 Block Matching ...................................................................................40 5.3 Square based block matching ..............................................................41 5.4 Triangular block matching ...................................................................42

5.4.1 Process.......................................................................................43 5.4.2 Optimisations ..............................................................................46 5.4.3 Initial outputs..............................................................................47 5.4.4 Extending motion vectors.............................................................48 5.4.5 Encoder performance and comparison...........................................51 5.4.6 Embedded fractal transform properties..........................................52 5.4.7 Saving motion vectors..................................................................54

5.5 Results ..............................................................................................55 6 Conclusions ...............................................................................................58

6.1 Discussion..........................................................................................58 6.2 Applications........................................................................................60 6.3 Future research ..................................................................................60

Richard Wilding April 2002 1

1 Aims and Objectives This project has a few clearly definable aims and objects:

· Investigate video codec Compression of Intra-frames Motion prediction Motion Compensation

· Produce prototype video compressor

Compression of Intra-frames Motion prediction Motion Compensation

Richard Wilding April 2002 2

2 Introduction The delivery of modern digital video footage be it via cable, the Internet, wireless network or DVD is a vast improvement on earlier analogue methods of video supply such as Terrestrial Analogue TV and vhs video. Digital representation of the moving image has the scope for substantial improvements in image definition and quality, accessibility of material through a widened means of distribution, and an expansion of applications for video such as remote medical imaging straight from the paramedic at the scene to the doctor at a hospital. A major aspect of digital video delivery is the trade off between video quality and file size/bandwidth required to store/transmit that video. The compromise is often determined by the application of the video. A high quality lossless video, which will require a large amount of bandwidth for delivery, might be chosen for technical or medical video where the fine details are important (i.e. faults in pipe lines). For the home entertainment industry a slightly less perfect video would be acceptable, and for those with slow connection speeds to the internet, a much reduced video quality and perhaps frame rate is the only option available. When video is stored digitally, it is encoded, the data is then decoded during playback to recreate the original (or approximately the original) sequence of images. The scheme for encoding the original data and subsequent decoding is known as a codec. The reduction in video quality, or loss of data that is introduced by many of the current codecs is due to approximations, quantisation of values and the discarding of data. In response to inferior video quality, many codecs employ a level of intelligence or selectivity during compression (encoding). This report is split between still image compression and video sequence coding. Still image compression is an important part of video coding because video is made up of a sequence of images shown rapidly after each other, giving the illusion of motion. Video codecs store still images in the stream and so knowledge of how they can be compressed is important if a video codec is to be produced. The report describes the technical processes involved in each of the prototype coders that have been produced. Functional diagrams and conceptual diagrams are included to aid the description of the processes. To conclude, results and comparisons to existing encoding methods are presented and discussed, along with a narrative explaining the final prototype and ideas for future research.

Richard Wilding April 2002 3

3 Still image compression

3.1 Fractal compression A fractal transform creates mappings between regions of data at different scales. A destination region is mapped to a source region which is most similar in content. The source region, called the domain, is generally larger than the destination region, the range. The domains are arranged to overlap more than one range which makes the result of repeat application of the mapping contractive, towards some final result called the attractor. The final transform will produce a group of transformations including translation, rotation and scaling; one for each range and will not contain any of the original data itself. An inverse fractal transform starts with a randomly populated data block, the same size as the original data, splits the data into the same range and domain regions and performs the mapping from domain to range several times over until convergence, a single application of the mapping for every range block is called an iteration. After several iterations (usually between 5 and 10) the final result is an approximation of the original data, it is different to lossy methods which throw away some detail in that none of the original data is kept at all.

Figure 1 – Highlighted regions of similarities

The fractal transform exploits the similarities of the data which is being encoded, it is not limited to application only on images, but this is where the transform is used in this project. Images contain many similarities that often exist at different scales or different orientations, see Figure 1. The attractor of a fractal transform of an image, what the decoder should ideally converge to, is the image itself. The aim is to describe the image being encoded in terms of copies of small areas of it which have been adjusted in brightness, size and/or orientation and placed at another position in the image. The way in which the image is split into range and domain blocks is called the partitioning of the image.

Richard Wilding April 2002 4

3.2 Simple encoder prototype Initially a simple fractal encoder using square range and domain blocks was developed to test the principle before a more sophisticated version utilising triangle based partitioning was prototyped. A description of the general principles relating to both are discussed below, along with the results of the simple encoder, before the triangle based encoder is discussed in more detail.

3.2.1 Partitioning

Figure 2 – Lena image, grid partitioned range (left) and domain (right)

In the simple prototype the image is partitioned into equal sized squares of 2x2 pixels for range blocks and 4x4 pixels for domain blocks, see Figure 2. The range blocks do not overlap but the domain blocks do. A 128x128 image will therefore have 64x64=4096 range blocks and (128-3)x(128-3)=15625 domain blocks. The fractal transform produces a mapping from one domain block to every range block; a domain block can be mapped to 0 or many range blocks, see Figure 3.

Figure 3 – Highlighted areas in the domain with associated mapping to the range blocks, note the green domain has been matched to several range blocks.

Richard Wilding April 2002 5

3.2.2 Note on image format

Most of the prototypes developed in the project work on and produce grey scale images. The prototypes actually load 24bpp bitmaps and convert them to another colour model, YIQ (luminance - inphase - quadrature). The reason for doing so is that the eye is more sensitive to the luminance channel and it is advantageous for encoding to work primarily on this channel [1], it also allows for a simple implementation; working on one block of data rather than three or one large interleaved data block.

3.2.3 Domain mapping

For each range block the similarity to each domain must be evaluated and the most suitable selected. Because each domain block is exactly twice the size as every range block a down sampled version of each domain can be pre-calculated, this will save processor load when performing the comparisons. The down sampled version of a domain block can be generated by taking the average of the four pixels in each quadrant, see Figure 4.

Down sample

Figure 4 – Down sampling 4x4 domain to 2x2

A comparison of a down sampled domain block to a range block can be made by calculating the squared error between the data contained in each. The most suitable domain block can be identified by finding the down sampled domain block that produces the least squared error, or in other words, minimises the distance between the blocks. See Equation 1.

se

x y

dxy rxy−( )2∑∑:=

Equation 1 –Calculating the squared error between data blocks

where dxy and rxy are the pixels at position x,y in the two sources being compared

Richard Wilding April 2002 6

3.2.4 Massic and Geometric Transforms

A massic transform is applied to each down sampled domain block as it is compared to each range block. The transformation is applied on a per range block basis and adjusts the brightness and contrast of the domain block to match that of the current range block, it cannot be pre-calculated in advance as was the case with down sampling the domains. The massic transform has two parameters s, scale and o, offset which translate the brightness and contract of the pixels in the block to match that of the range which is being matched to, see Equation 2.

sCRD

σ2

:= o r s d⋅−:= r

C - Covariance of R and DRD

σ - Variance of domain Equation 2 - Calculation of massic parameters [7]

The massic transform can be integrated into the calculation of the squared error, Equation 3.

se

x y

s·dxy rxy+ο−( )2∑∑:=

Equation 3 – Squared error calculation integrated with massic transform [7]

Domains are permitted to overlap to allow the number of domains to be maximised, this increases the likelihood of finding a good match for each range block. A way to increase this further is to populate the domain pool with rotated and transformed versions of the original domain blocks. For a square data block seven geometric transformations produce unique versions of the original (Figure 5). Combined with the original domain pool this produces eight times as many domain blocks to make a comparison against, but does however increase the processing time almost 8 fold.

Figure 5 - Possible geometric transformations for square blocks

Richard Wilding April 2002 7

3.2.5 Prototype results

The transformation information that needs to be stored in order to perform the inverse transform correctly is summarised in Table 1 below. The data can be stored sequentially in order of range block and so an index to each range is not required in the transformation output.

Massic Transform S and O 11 bits [2] Domain identity 16 bits Geometric Transform Code 3 bits Total 30 bits

Table 1 – Fractal encoder output

The results in Table 1 give a poor compression ratio compared to the raw data. In the raw data each range block contains 4 bytes, or 32 bits which gives a ratio of 1.067:1. Because the luminance channel is much more significant than the Q and I channels the same domain block matched for the luminance channel can be used for the other two, with the relevant properties for an independent massic transform on each channel being the only extra data which needs to be stored. Table 2 is an updated view of the data required for each encoded range block.

3 * Massic Transform S and O 33 bits Domain identity 16 bits Geometric Transform Code 3 bits Total 52 bits

Table 2 – Fractal encoder output updated to include colour transform information

52 bits for each range block compared to the 96 bits required for the 2x2 pixel block to be in colour gives an improved ratio of 1.85:1. Figure 6 shows the result of the inverse fractal transform iterated once, twice and ten times.

Figure 6 – First, second and tenth iterations of inverse fractal transform

Richard Wilding April 2002 8

3.2.6 Encoding performance improvements

Increasing the size of the range blocks reduces the number required to cover the entire image. The encoder was updated to allow for the size of the range (and domain) blocks to be increased. If each range blocks are increased in size to a block of 4x4 pixels, and each domain increased to a block of at least 8x8 pixels then the compression ratio would be increased to 7.4:1. Likewise if each range is 8x8 pixels and each domain is at least 16x16 pixels then the compression ratio increases to 29:1 . The output of the latter, which can be seen in Figure 7 is much of poorer quality than the former. Figure 8 lists the compression times and mean opinion score of encoding the Lena 250x250 Lena image with different range sizes.

Figure 7 – Image encoded with 4x4 range blocks compared to 8x8 range blocks

Range Size Domain Size Encoding Time (seconds) Compression ratio MOS 2x2 4x4 870 1.85:1 4 4x4 8x8 49 7.4:1 4 8x8 16x16 33 29: 2 Jpeg Compression <1 16:1 5

Figure 8 – Encoding comparisons, jpeg compression performing much better.

Using a fixed size grid throughout the range and domain partitions wastes valuable space in memory or in a stored bit stream by including range blocks where they are not needed. If the range block size is increased however the quality is reduced. A method is needed to allow the partitioning to adapt to the content of the image, placing small range blocks at areas of detail where needed and larger range blocks where they will suffice, at flat areas of the image.

Richard Wilding April 2002 9

Several methods of partitioning the image to allow for different size range blocks have been explored. Reference [13] discusses a quad-tree method which partitions the source image into large blocks and then continually quarters each block until an acceptable matching error threshold is reached. Although allowing different size blocks the method limits the size to divisions of the largest (i.e. 32cx32,16x16,8x8) pixels rather than accommodating blocks of an arbitrary size, it also does not allow for alignment of the blocks to object boundaries. Fisher discusses an h-v partitioning scheme, which whilst similar to the quad-tree partitioning method, aims to position blocks at regions of the image with diagonal features to encourage self similarity between the same region at different scales [2]. Davoine, Antonini, Chassery and Barlaud [3] utilise a Delaunay triangulation to partition an image into arbitrary sized triangle blocks for the range and domain. The triangulation is formed upon the image content and allows for arbitrary alignment to the outlines of objects. The use of triangular motion blocks to code a video sequence was an area intended for investigation for this project and the Delaunay triangulation which allows for partitioning at the boundary of objects appears an ideal starting point for detecting and predicting the motion of objects (see section 5.2).

Richard Wilding April 2002 10

3.3 Triangulated partitioned encoding Delaunay partitioning of an image for fractal compression was chosen as the method to investigate further because of the versatility of the position and size of the blocks created; also not much research has been documented on the method since Davoine’s papers and thesis [3,4]. No research in extending the triangulation for utilisation in motion compensation has been found and so the idea is somewhat novel and seemed worthy of investigation. Custom Delaunay triangulation algorithms, which are not inherently related to any form of image or video sequence, were designed and produced for the project. The design of the algorithms allows direct integration with image processing for the still image fractal encoder and motion compensator algorithms (see sections 4.9 and 5.4). Specifics of the algorithms for Delaunay triangulation, triangulation of the image and handling of triangular image data are discussed in section 4.10. The results of the triangular partitioned still fractal codec are first presented.

Figure 9 - The range partitioning

Figure 10 - The domain partitioning

Figure 9 shows the image overlaid with the range partitioning and Figure 10 shows the domain partitioning. Unlike the previous method the domain blocks do don’t overlap each other, however the triangulation was performed with different parameters which produced larger domain triangles that overlap more than one range block. Because the block size is arbitrary the domain blocks cannot be down sampled as a pre-processing step, each must be re-sampled for every range block that it is compared against, this increases the processing load.

Richard Wilding April 2002 11

The process for computing the fractal transform is very similar to the simple grid partitioned encoder but with a more sophisticated partitioning mechanism and the block matching routines have been redesigned to facilitate examination and re-sampling of triangular data. The extra overhead of dealing with triangular blocks of data increases the processing time significantly, in addition the partitioning time is also increased.

3.3.1 Optimisations

In order to optimise the encoding process an initial and less computationally demanding comparison of the suitability of a domain block is made. A normalised histogram of each domain and range block is constructed; this can be generated in a pre-process step and only needs to be done once. If the mean-square error between the normalised histograms falls below a threshold then a full exhaustive appraisal of the block is made, otherwise it is ignored. By using a normalised histogram the comparison if made independently of the brightness and contrast of the blocks which would normally be accounted for during the exhaustive comparison. The normalised histogram also gives some consideration to the distance between the texture content of the blocks. If the initial comparison of normalised histograms produces no matches then the range block will not be mapped to a domain which will cause errors during the inverse transform. The errors will be propagated to any other range blocks which are mapped to a domain covering the unmatched range block. See Figure 11.

Figure 11 – The orange circle highlights two range blocks which have not been mapped to a domain block. The yellow circles identify ranges based on domains which cover the unmapped range; the propagation errors are very evident.

A way to compensate for this is to relax the histogram error threshold for blocks that do not match, repeat the search, and continue to relax the threshold further if necessary until a block is found.

Richard Wilding April 2002 12

The number of domains that need to be inspected for each range can be further reduced by limiting the search to domain blocks which are in close proximity to the range being matched. Other optimisation techniques such as hierarchical-tree sorting [12] and domain block classification [1] have not been examined because the overall aim of the project is to develop a video codec, the principles of key-frame compression have been explored in a lot of detail and any more research in optimising the principles would have taken valuable time away from the main focus.

3.3.2 Comparisons and results

Figure 12 presents a comparison of the encoder’s performance on the 256x256 Lena image. The encoder is closer to the jpeg quality than the square block encoder. It is clear that the partitioning of the image very much affects the quality of the encoded image. Again, if image compression was the focus of this project then the partitioning methods would be investigated further, perhaps looking for optimal split and merge thresholds.

# Range Blocks Encoding Time (mins) Compression ratio

MOS

12000 49 4.37:1 5 4200 20 12.5:1 4 3100 5 17:1 4 3100 (local) 3 17:1: 4 2500 (local) <2 21:1 3

Figure 12 – Comparisons of different compression ratios obtained by varying the split and merge thresholds.

An RMS calculation is commonly performed to evaluate the errors produced by encoding schemes. The RMS calculation is very similar to that used to match the blocks in this prototype, measuring the error pixel by pixel. The fractal transform however does not quantize or throw away data, but generates data which approximates the original, for that reason an RMS calculation is far less important and less suitable than a Mean Opinion Score rated evaluation which is performed by people and is a qualitative analysis of the encoder’s performance.

Richard Wilding April 2002 13

4 Delaunay Triangulation A Delaunay triangulation of a set of points maximises all of the interior angles of each triangle, producing triangles which are ‘circular’ in nature rather than long and thin which would have two very small interior angles and one large, see Figure 13. Furthermore given any two points which form vertices a and b of a triangle, the third vertex c, which is selected from the point set, produces the most ‘circular’ triangle of any candidate point in the set.

The triangles created by a Delaunay triangulation are desirable firstly because similar shape triangles will be distorted less when mapped to each other, such as in the fractal compression application. Secondly circular triangles are more likely to cover similar regions of an image than long thin triangles, this has obvious benefits for image segmentation. A triangulation of a point set satisfies Delaunay constraints if each triangle passes the ‘empty circle’ test [6], Figure 14. The test states that a circle passing through each of the three vertices making up the triangle must not contain, or have on its perimeter, any other point from the set.

Figure 14– Empty circle test for highlighted triangle

Figure 13– Arbitrary (left) and Delaunay (right) triangulations

Richard Wilding April 2002 14

4.1 Construction of Delaunay triangulation Guibas and Stolfi introduce an incremental method (allows insertion and removal of points after the initial triangulation) for triangulating a set of points and a pseudo-code description which can be implemented in a high level language [5]. The method described is based on a data structure of edges and the relation of an edge to neighbouring edges. This method was implemented initially but was deemed unsuitable because of the complexity in constructing and referencing complete triangles from the edge information alone. A new method was designed and implemented for this project with the data structure based on complete triangles. A few properties of each triangle; the co-ordinates of its vertices and the three neighbouring triangles, must be known and several constraints adhered to in order to maintain a legal Delaunay triangulation, see Figure 15 below. The constraints must be maintained during all operations such as point insertion and removal.

Link 1

Link 0

Link 2

Vertex 0

Vertex 1

Vertex 2

1. Vertex 1 ? Vertex 2 ? Vertex 3 Each vertex must be unique 2. Link 1? Link 2 ? Link 3 Three neighbouring triangles

must be different. 3. Link[1..3]->Link[..]=Triangle Neighbouring triangles must

have link back to current triangle.

Figure 15– Triangle Properties and constraints

Richard Wilding April 2002 15

4.2 Point insertion When a new point is inserted into the set, the triangulation must be partially recalculated to include it. Insertion of a point results in the triangle containing it being replaced with three triangles, each with a vertex at the new point (see Figure 16). The triangulation constraints discussed above are enforced, updating the links to neighbouring triangles by inheriting those from the triangle being replaced, Figure 17. In this implementation an empty triangulation (one with no defined points) actually contains one triangle which exists between three vertices at virtual infinity (0,-inf), (-inf,inf), (inf,inf), so the first point inserted into the set also splits a single triangle into three, extending from the initial point to infinity

Link 2

Link 1

Link 0

Vertex 1

Vertex 2 Vertex 0

(Original link 1)

(Original Vertex 1)

(Original Vertex 0)

Link 0

Link 1

Vertex 2

Vertex 1

Vertex 0

(Original Vertex 2)

(Original Vertex 0)

Link 2(Original link 0)

Link 0

Link 1

Vertex 1

Vertex 2

Vertex 0

(Original Vertex 2)

(Original Vertex 1)

Link 2(Original link 2)

Figure 17– Links to neighbouring triangles being inherited from original triangle

Link 1

Link 0

Link 2

Vertex 0

Vertex 1

Vertex 2

Figure 16– Point insertion splits existing triangle into three

Richard Wilding April 2002 16

4.3 Swapping edges After a new point has been inserted into the triangulation the new triangles that have been created must be verified against the empty circle test. The empty circle test will fail if the independent point (not a vertex of the shared edge) of a neighbouring triangle is within the circle passing through the three vertices of the triangles being inspected. If the test does fail the orientation of the shared edge between the current triangle and the triangle with the illegal vertex must be swapped, Figure 18 shows the procedure in detail. One vertex from each triangle is swapped and the links to neighbouring triangles interchanged to maintain a valid data structure. If a swap does occur the other two edges of the triangle that was a member of the swap operation are then verified against the empty circle test. The iterated application of the empty circle test continues until no more invalid triangles are found.

Link 0

Link 2

Link 1

Link 0

Link 2

Link 1 Link 0

Link 0

Link 1

Link 1

Link 2

Link 2

Link 1

Link 1

Link 0

Link 0

Link 2

Link 2

(Old Link 0)

(Old Link 1)

(Old Link 1)

(Old Link 0)

The neighbouring triangle (white) has a vertex within the circle passing through the vertices of the highlighted triangle.

The neighbouring triangle is oriented to face the current triangle so they share the linked edge 2.

The shared edge is swapped to the alternative for the two triangles. One vertex is exchanged between each of the triangles.

Figure 18– Swap procedure to enforce empty circle

Richard Wilding April 2002 17

4.4 Locating a point within the triangulation As discussed above, inserting a point requires the splitting of the triangle containing the new point. Before the triangle can be split and the triangulation constraints imposed the containing triangle must first be identified. The links between a triangle and its neighbours allow a fast ‘walking’ algorithm, like that in Guibas and Stolfi’s paper, to be used. The search can begin at any triangle in the triangulation and the relevant triangle identified by walking through the links. Figure 25 shows the result of the walking algorithm which is discussed in more details below, a description of the tools used in the location algorithm are first introduced. The walking algorithm relies on the constraints of the triangulation and is an example of how important it is that they hold true. A point can be shown to lie to the right of a vector between two independent points (Figure 19) if the point being analysed, the end of vector and the start of the vector are arranged in counter-clockwise order [5], see Figure 20. Taking the three points in that order will generate a positive area for the triangle if they are arranged CCW, else a negative area will be calculated. [5]

Figure 19 – Orange point is ‘right’ of vector

Figure 20 – CCW Orientation

Richard Wilding April 2002 18

If it can be shown that a point is to the right of every edge between each of the three vertices of the triangle (in a clockwise direction) then the point is contained within that triangle, Figure 21. If the point is left of even one of the edges the point is not within the triangle (Figure 22).

Figure 21 – Point is ‘right’ of each edge and is inside the triangle

Figure 22 – Point is not in the triangle, it is left of one of the edges (red).

A point is also considered to be within a triangle if it lies on the triangle’s perimeter of the. The area formed by two vertices from one of the triangle’s edges and a point along the edge will always be zero. However any point along the direction of the vector will have zero area, even if the point is not within the spatial range that the edge is incident upon the vector, so a second criterion must be met which limits the point to the extents of the edge. By confirming that the point is upon the circumference of, or within, the circle which passes through each of the triangle’s three vertices, as well as having zero area with one of the edge’s two vertices it is guaranteed the point is upon the triangles perimeter. Figure 23.

Figure 23– Both the blue and the red points have zero area with the vertices of the edge they lie upon. The blue point is also within the circle formed through the triangle’s three vertices and so is upon the triangles perimeter. The red point is outside of the circle and therefore not on the triangle’s perimeter

Richard Wilding April 2002 19

4.5 Triangle Orientation Many of the triangulation operations developed for this project, such as point insertion, merging, splitting and navigating around a local ring of neighbouring triangles require the neighbouring triangles to be in a certain orientation. Three cases exist, and in each case the triangle being oriented is rotated until the case condition is satisfied.

1. ‘orient to’ - the current triangle is opposite the linked triangle 2. ‘orient clockwise to’ – the oriented triangle is rotated so if navigated

clockwise (Link 1) from it the current triangle is reached. 3. ‘orient counter-clockwise to’ - Navigating counter clockwise (Link 0) from

oriented triangle reaches the current triangle.

Rotation of the triangle is a simple matter of shifting the vertices and links of a triangle up or down one position such that

Vertex 0 = Old Vertex 2 Link 0 =Old Link 2 Vertex 1 = Old Vertex 0 Link 1 =Old Link 0 Vertex 2 = Old Vertex 1 Link 2 =Old Link 1

Richard Wilding April 2002 20

4.6 Walking Algorithm The identification of the triangle containing a point using the walking algorithm utilises the properties and algorithms outlined above. The algorithm is an iterative process which repeats several simple steps until the relevant triangle is identified. The block diagram in Figure 24 describes this process.

Select first trianglefrom triangulation

Is point on perimeterof current triangle

Is point to theright of triangleedge 0

Move anti-clockwiseto next triangle

Move clockwiseto next triangle

Orient triangle toface previous

Is point withincurrent triangle

Triangle found

Yes

Yes

Yes

No

No

No

Figure 24– Block Diagram description of walking algorithm

Figure 25– Walking algorithm locating the point identified by the blue circle. The white line identifies the edge that is oriented to face the previous triangle in the sequence.

Richard Wilding April 2002 21

4.7 Splitting The application of a Delaunay triangulation in the scope of this project is to segment an image. Splitting a triangle is a simple case of inserting a new point within the triangle. In this implementation, when segmenting an image, the triangle is split at the barycentre of the triangle. The barycentre of a triangle is the point at which the three lines from each of the vertices passing through the mid point of the opposite edge cross, Figure 26. Equation 4 is an efficient manner to calculate the barycentre of the triangle.

Figure 26 – Splitting a triangle at the barycentre.

x

y

v1x

v1y

v2x

v1y

v3 x

v3 y

13

13

13

13

13

13

⋅:=y

Equation 4 – Calculation of barycentre using three vertices v1,v2 and v3 [14].

Richard Wilding April 2002 22

4.8 Merging Merging neighbouring triangles is more complicated than splitting a triangle. A successful merge operation always removes two triangles from the triangulation. Figure 27 shows the steps of a successful merge operation. Merging steps. 1. Orient Triangles to form counter-clockwise ring. 2. Select vertex 1 from initial triangle for origin of new

triangles 3. Store vertex 2 of current triangle 4. Move to next triangle in ring 5. Create new triangle with vertices at; origin, stored vertex

2 and vertex 2 of the current triangle. 6. Go to to step 3 until ring is completed 7. Validate each triangle against empty circle test and swap

edges if required

a b c

d e f

g h

Figure 27 – Steps involved in merging a ring of triangles. a and b show the orientation of triangles to create a counter clockwise ring, this continues around the whole ring. c-e show the completed ring with f highlighting the point to be removed. Step g shows the point nominated as the origin for the new triangles and h is the final result after each triangle has been verified against the empty circle test and the edges swapped if necessary.

Richard Wilding April 2002 23

When merging triangles it is important not to destroy the constraints of the delaunay triangulation, for that reason not all groups of triangles can be merged. Firstly only a full ring of triangles can be merged, a partial merging of a ring would result in an illegal triangulation where one or more points are not joined correctly. A ring can consist of three or more triangles, therefore it is only possible to merge groups of three of more triangles. In some cases the triangle which is the start of the merging process has triangles either one or two neighbours along which share a common neighbour. If the ring was merged with that triangle as the initial starting point then triangles would be created that overlap each other and have two links referencing the same triangle, this obviously voids the triangulation constraints. This does not mean that the ring can not be merged, either the sub ring causing the problem can be merged first or another triangle in the ring can be chosen as a starting point. Figure 28 shows an example of such a case and the possible solutions.

a b

e fd

g

c

Figure 28 – a, Merging of a ring begins at first highlighted triangle. b, Dark triangles share a neighbouring triangle, and c, new triangles would overlap. Step d shows an alternative direction for merging. Steps e and f demonstrate merging a second ring before the intended ring, which is now successful. Step g highlights what would be a successful merging of the same ring but starting from a different initial triangle.

Richard Wilding April 2002 24

4.9 Partitioning the image support with Delaunay triangulation. Ideally image partitioning will result in segmentation of the image into the different objects in the scene, in practice however the scene is partitioned into similar regions of grey scale which hopefully correlate to objects and the different parts of an object. For motion compensation, segmenting the image at the boundaries allows the movement of an object or regions of an object to be detected and tracked. The final triangulation has larger triangles over flat, homogenous areas of the image and smaller triangles covering areas of detail. In the application of fractal compression this partitioning method allows for fewer range blocks without compromising quality.

Figure 29 – Initial grid partitioning of the image before the split and merge process. Notice that unlike the square partitioning in the earlier prototype each square is actually two triangles.

Image partitioning begins with overlaying an evenly spaced grid upon the image (Figure 29). Using the methods described above, a sequence of split and merge operations are performed until convergence. The triangulation converges after convergence of firstly the split operation, and then the convergence of the merge operations.

Richard Wilding April 2002 25

4.9.1 Splitting triangles over an image

Each valid triangle in the triangulation covers a region of pixels from the entire image. The processes and issues involved in sampling and storing triangular regions of an image are discussed in detail in section 4.10. The triangles in a triangulation are stored internally in a linked list data structure. The ‘split over image’ operation starts at the head of the list and processes each triangle in turn. If a triangle is split the three new triangles are added to the tail of the linked list, this is favourable over starting from the beginning of the list again as it prevents re-examination of triangles previously deemed not suitable for splitting. Another benefit is that the convergence of splitting the triangulation is easily detected as it occurs when the tail of the list is reached.

grey level range=max(p)-min(p)

where p is a pixel in the data ofthe current triangle block

range

historgram

Figure 30 – Measuring the range of grey levels in the block

The pixel region covered by each triangle is examined for variance in grey level (Figure 30). If the range of grey levels exceeds a set threshold, which identifies the triangle as not being homogenous in grey level, and the triangle area is above a certain size then a new point is inserted into the triangulation at the barycentre of the triangle, this effectively splits the triangle. Analysis and splitting proceeds through the entire list of triangles, which includes those created from previous split operation which can be split further if necessary, until either all triangles are deemed homogenous or are smaller in area than the minimum threshold allowed to be split.

Richard Wilding April 2002 26

4.9.2 Merging triangles over an image

Merging occurs over a ring of neighbouring triangles, the details of the process from a structural view point are described above in section 4.8. When merging triangles for image segmentation an extra check is introduced during the initial orientation phase of the merge process. The check compares the average grey level content of the neighbouring triangles in the ring being merged, if the difference between them is below a certain threshold then the process is allowed to continue. If the threshold is exceeded then the merge operation is aborted. It is important to note that the grey level values must be compared against all triangles as opposed to the neighbour of the current triangle being inspected as this would allow the grey level difference between the triangles to increase and the overall variance in the ring to be extended to n*threshold, where n is the number of triangles in the ring, see Figure 31.

Figure 31 – In the left ring none of the triangles contain data which range more than 10 grey levels from any other of the triangles . In the right ring triangle contains data which ranges more then 10 grey levels from the data in the neighbouring triangle.

Richard Wilding April 2002 27

4.9.3 Coding method for triangulation

The final triangulation must be reproduced without the original image support, which was used to control the split and merge operation, during decoding. Two options exist; firstly the location of each point could be stored, the Delaunay triangulation of the saved points will always reproduce exactly the same triangulation set by definition. A 128x128 pixel image can require over a few 1000 triangles and points, depending on the content and the split and merge thresholds (a 128x128 image of lena required 1350 triangles and 1458 points for range partitioning). 1500 point translates to 3000 bytes, or just under 3KB of data to represent the triangulation, duplicating for the domain partitioning gives a file size of around 6KB for the partitioning alone which when compared to the original file size of 49KB makes the maximum expected compression ratio for the file well under 10:1. As a second option the split and merge process could be encoded into a bit stream with a single bit specifying if a split or merge was successful. If the decoder and the encoder start with exactly the same initial grid and the decoder performs the split and merge operations identically (such as always adding new triangles to the tail of the list and skipping a split attempt on triangles below a certain area) then the final triangulation will be the same as that produced during encoding. The same image required 1255 split and 22207 merge operations, which equates to 23462 bits or about 3KB for the range triangulation which is the same as storing the coordinates for each point. Optimisations in the merge operation were found which significantly reduce the number of merge steps that need to be both attempted and recorded. Because the merge routine simply moves through a list of valid triangles in the triangulation no knowledge of the other triangles in the ring which is to be merged is known, a ring of ten triangles will have ten attempts made to merge it. Obviously if on the first attempt the ring is successfully merged then it will be destroyed with new triangles created in its place, if the merge fails though an attempt will be made to merge it from another starting triangle which will again fail. Further attempts on the same ring will continue to fail; wasting both processing time and valuable space in the bit stream. The optimisation is realised by storing a failed attempt to merge a ring in the origin triangle’s data structure, before attempting to merge a triangle with its neighbours a check is made against all triangles in the ring to see if a merge attempt has been made in the orientation of the current merge attempt (merging can occur in both clockwise and counter-clockwise orientation). If an attempt has been made then the operation has failed (else the triangle would no longer exist in the list) and so the attempt is aborted and neither a 0 or 1 need be entered into the bit stream. The decoder must reconstruct the failed merge operations when a failed bit is detected in the bit stream. The optimisation reduces the merge steps to 1912 attempts which equates to 396 bytes, nearly 10 times smaller than the original method and requiring less than 1% of the 49KB source file, allowing for much better compression prospects.

Richard Wilding April 2002 28

4.10 Triangular Regions of data. The triangulation of the image provides triangular blocks overlaying the image data which will be used in the fractal block encoder and for motion vectors in the video compression application. An efficient manner to scan, analyse and store the regularly spaced pixel data under the triangle is needed. The data must be scanned pixel by pixel so each can be analysed in turn, much like scanning a square block of data from top-left to bottom-right. Computer memory has just one dimension, a block of memory can be defined by its logical offset in memory and its length. When storing the raw data of an image the length of the memory block will correspond to the area of the image (width*height) multiplied by the bit depth (xx) of the image (say 24bpp , or three bytes for a typical true colour image), with each pixel being represented by xx bits in the block and each row of pixels ordered sequentially one after the other. For a rectangular image, which commonly use a Cartesian coordinate system (x or columns– along width of image, y or rows–down height of image), the block structure serves it purpose very well because each row of the image falls at an exact multiple of the width (or some block aligned adjusted width) which facilitates a fast calculation of the offset from the start of the memory block when locating the point in memory representing a specific pixel, Equation 5. Furthermore the first pixel of each row in a rectangular block of pixels offset at some point in the image x,y will always be bpp*width (or aligned width) bits from the first pixel in the block on the previous row, Figure 32.

memory offset = x+(y*bw)x=pixel horizontal coordinatey=pixel vertical coordinatebw=width of image in bytes

Equation 5 – calculation of pixel offset in memory for rectangular block

The majority of image processing in this project is performed on the 256 grey level luminance channel of the true colour image. Each pixel is represented by 8 bits or 1 byte. Any references to pixel data in the following section assume an 8bpp image depth.

Richard Wilding April 2002 29

Figure 32 – The highlighted region identifies a rectangular block of pixels in the source image. If the offset in the raw image memory of the blue point is know (can be calualted using Equation 5) then the offset of the red pixel can be found efficiently by adding the width of the block to the blue pixel’s offset. The memory offset to successive rows can be found my adding multiples of the block width.

Figure 33 shows that for an arbitrary triangle (no inner angles are pre-defined or constrained unlike right angle or isosceles triangles) the first pixel of each row is not aligned at a regular offset from the start of the previous row.

Figure 33 – The blue pixel’s offset in memory is known, the first red pixel is 19 pixels from the blue pixel; the offset in memory of the red pixel can be calculated easily knowing this. The remaining red pixels are placed at regular intervals of 19 pixels, note that they are not at the beginning of each row of the triangular block and so a this is no longer a correct way to calculate the starting offset of each row.

Richard Wilding April 2002 30

4.10.1 Barycentric coordinates

Even if the vertices of an arbitrary triangle are defined in Cartesian space a more suitable way to traverse along the direction of the edges, (dual to moving along columns and rows of an image) is to use barycentric coordinates. Barycentric coordinates describe points within the triangle (or on the perimeter) in terms of the fractional contribution of each vertex to the overall position. Three parameters define a point in barycentric space U,V and W (as opposed to two, X and Y in Cartesian) which, being proportional, takes values between 0 and 1 inclusive with a simple rule U+V+W=1 that must be upheld (Figure 34).

(0,0,1)

(0,1,0)

(1,0,0)(0,0,1)

(0,1,0)

(U,V,W)

U+V+W=1

(0.7,0.2,0.3)

Figure 34 – Barycentric coordinates of two different size triangles.

Richard Wilding April 2002 31

The pixel data enclosed by the triangle can be scanned by stepping through two of the parameters, say U and V, from 0 to 1 whilst the third, W, can be calculated by W=1-U-V. Pixels that are scanned whilst W>=0 are inside of the triangle whilst when W<0 the pixels should be ignored as they exist outside of the triangle. U,V and W must be converted to Cartesian coordinates (X,Y) to obtain the location of the pixel in the image, see Equation 6. Although this method has some overhead, stepping through values in U and V that are not required because W (which must be calculated) is invalid, it is a lot simpler and less computationally expensive than setting up several gradients, relating to the edge directions, in Cartesian space and parsing through the triangle data in that way. The triangular data is stored in an artificially aligned rectangular block of memory, see Figure 35.

x

y

v1x

v1y

v2x

v2y

v3x

v3y

U

V

W

⋅:=

U

Equation 6 – Conversion of barycentric coordinates to Cartesian space.

Masked regions

Figure 35 – Rectangular regions of memory storing the triangular block data, a large region of the image is stored for illustrative purposes only – the blocks stored are normally much smaller. Note the identified masked regions as discussed in section 4.10. Also notice how areas in the memory are distorted due to mismatch between barycentric and Cartesian grids.

Richard Wilding April 2002 32

4.10.2 Grid mismatch and sampling

Another issue is that the barycentric grid of the triangle does not necessarily align exactly to the Cartesian grid of the image, Figure 36. Also the length of an edge in pixels, which can be calculated from integer width and height components using Pythagoras, may itself not be an integer but a floating point number. Stepping along the edge components in whole pixels when scanning the data, or drawing/analysing the triangle, skips whole image pixels and produces a distorted block of pixels. Nyquists law (Equation 7) states that to guarantee all elements of a signal are sampled the signal must be sampled at twice the frequency of the highest frequency component to be resolved. In this case there is just one component, the pixel, so sampling at half pixel intervals along the triangle edges will ensure all pixels are sampled (or drawn/analysed).

Figure 36 – Mismatch between barycentric and Cartesian grids. Many of the triangular regions cover more than single pixel.

Richard Wilding April 2002 33

f =2*fsample max

Equation 7 – Nyquist’s sampling theorem

This process introduces a lot of overhead when compared to using rectangular blocks in the coding and analysis of the image/video. Sampling at twice the frequency requires four times as much memory as an equivalent rectangular memory block (imagine covering an entire image with two equal triangles), and therefore requires four times as much processing during any analysis. Statistical analysis of the triangle block data is important when matching and transforming the data to other triangles or other regions of the image. Calculating the average grey level content of the block and the mean-square error between it and another block require each pixel in the block to be processed. A method to mask the memory block holding the data for the triangle, which is larger than the area of the triangle, is needed so data which is outside of the triangle (when W<0) is correctly identified as garbage. An efficient and simple method is to adjust all grey level pixel values of 255 to 254 and set the garbage data to 255; this can be done during the sampling pass and has little overhead. The human eye is sensitive to fewer than 255 different grey levels and so the loss of the single grey level will not be noticeable but has huge benefits when calculating average content and rendering/re-sampling data, the algorithm knows to skip straight over the data values of 255 without the need to reference an independent block of memory for a mask image or perform extra calculations to determine inclusion of the memory point in the triangle. See Figure 35.

Richard Wilding April 2002 34

4.10.3 Re-sampling

Mapping a triangle onto another requires re-sampling of the data to the size of the destination triangle. The dimensions of the triangles can dither significantly and so many pixels in the larger triangle may contribute to a single pixel in the smaller. A nearest neighbour method (choosing pixel which is closest in the larger triangle to the relative position in the smaller) will loose valuable detail (see Figure 37). A more effective method is to average the pixels surrounding the relative point being translated, the region being directly proportional to the ratio of the dimension of the two triangles. For example if a triangle is being mapped to one three times smaller the destination pixel will be the average of a 3x3 neighbourhood surrounding the pixel at the same relative position in the larger triangle. Although much more computationally expensive the result is a vastly superior depiction of the triangle at a smaller size.

Figure 37 – Nearest neighbour re-sampling versus average-neighbourhood re-sampling

Richard Wilding April 2002 35

5 Video Compression The goal in video compression is to reuse as much data from other frames as possible, only storing/transmitting the changes between the sequences of images. Video sequences contain a lot of temporal redundancies which is due to areas in the images being repeated frame after frame. Current video codecs (such as mpeg) contain different types of frame data; intra-frames and predicted frames. An intra frame is a full image compressed in some manner and can generally be reconstructed independently of other frames. A predicted frame contains information regarding only the differences between the previous frame and the current frame; construction of the complete frame requires the previous frame to be available, which itself may be constructed from predicted frames occurring before it, Figure 38 shows the principle graphically.

Motion Sequence

I

P

P

PP

I

Figure 38 – Arrangement of intra-frames and predicted frames

in coded video sequence [11]

Richard Wilding April 2002 36

5.1 Global motion compensation Sometimes the movement in a frame is not limited to small areas; the whole scene or major parts of it move, this could be from the camera panning from left to right for example [11]. A pixel by pixel comparison of subsequent frames of such a scenario would show very little correlation and practically the whole image would need to be stored/ retransmitted to depict the movement as it appears that the entire data has changed. A more sensible approach would be to describe the difference between the frames as a translation, and then only the extents of the image where there is no overlap would need to be coded as new image blocks.

Figure 39 - Example of global motion in a frame. The smaller rectangles represent the camera motion within the scene. Past frames are shown with dotted lines.

Figure 39 shows a scene with much global motion; the camera pans around as the actors talk. The only real movement within the scene is by the actors themselves, the room stays static albeit with exception to how the camera views it. Figure 40 shows several frames of the sequence layered on top of one-another, the motion of the camera is apparent by observing the background which has been blurred by the movement. Figure 41 shows the result of global motion correction, the frames have been aligned which is apparent from the background not being blurred. Blurring is instead seen around the actors hands and heads which are the areas of relative movement over the sequence.

Richard Wilding April 2002 37

Figure 40– The motion of the camera produces a blurred background,

Figure 41 – The same frames after correction showing good alignment in the background. Notice the black bars, the edges of the frame, which are present on the left emphasising the offset translation

Richard Wilding April 2002 38

5.1.1 Process

The detection and correction process compares overlaid frames pixel-by-pixel over a range of offsets (Figure 43). The offset position which produces the smallest squared error (Equation 1) between subsequent frames is selected as the most probable motion vector which occurs in the scene. A motion vector is calculated for each frame individually. The reduction in temporal redundancy (see Figure 42) is of great value because fewer blocks need to be analysed and coded for motion.

Figure 42 – The difference map on the top is busier than the one produced after global motion compensation. The reduction in differences between frames means there is less temporally redundant data.

Richard Wilding April 2002 39

Figure 43 – The orange outlines show how the frames are corrected for global motion by testing the correlation of the two frames at different offset over the entire local neighbourhood.

The process for global motion detection and correction in this project uses only the luminance data of the scene rather than the separate RGB channels. This increases the speed of the comparison by analysing just a third of the image data. In addition tests were run on down-sampled data at a factor of 2,4 and 8. This increases the speed of comparison further but the frames are not aligned as accurately as a reduced resolution (to within the nearest down-sample actor) is effectively used and ultimately enforced on the offset vector produced. However, the concept of the technique has been proven and the algorithms developed are used in the final prototype codec of this project. Time did not permit any optimisation attempts, such as finding a coarse offset vector which is refined or focusing on different regions in the image (such as the centre), to be made.

Richard Wilding April 2002 40

5.2 Block Matching After global motion compensation the differences between subsequent frames are a result of movement of the objects within them or from changes in angle to the scene which can not be compensated for fully by a single global motion vector, such as the view point being angled. The logic behind motion compensation is that the areas of the image that have changed are similar to regions close to the origin of the area in the previous frames, and indeed in subsequent frames. For example, Figure 44, shows subsequent frames from a news reader sequence, the changes between frames are a result of the news reader moving her head and mouth. Rather than transmitting all of the data in the regions that have moved the image can be partitioned into blocks and a motion vector assigned to each block. For forward motion prediction, blocks containing movement from the current frame are compared to blocks of the same size in the previous frame over a range of offsets. Typically blocks are matched by minimising the least-mean square or least square error within the nearby neighbourhood, Equation 1. [8,9]

Figure 44- Four frames from the news reader sequence, and the emphasised difference map between them

Richard Wilding April 2002 41

5.3 Square based block matching The results of a simple block matching algorithm are shown in Figure 45. The algorithm splits the image to blocks of 16x16 pixels. Inspection of block matching from another sequence (Figure 46) identifies issues with the method. Movement is confined to square blocks, which have no regard to the content of the image such as object boundaries. Objects and the motion of objects are not inherently square and so the errors produced by making them so soon propagate over several frames to produce a very poor quality prediction (Figure 46). Most block matched motion compensators encode low bit rate correction information to reduce the effects.

Figure 45 – Output from block based motion compensator

Figure 46 – Propagation of matching error over just four frames. Traditional Block matched compensators will broadcast error compensation after the motion vectors so the propagated errors will not be so apparent.

Richard Wilding April 2002 42

Another shortcoming of traditional block matching is that several motion blocks may describe a motion of a single region of the image which they all cover, this is wasteful and defining a single, larger motion block would be a more efficient use of memory. This leads to a similar problem which was discussed for fractal image compression in section 3.2.1, increasing the block size will decrease the accuracy of the block matching for finer details. A method for dynamically controlling the size of the blocks to suit the image content is again needed, as seen previously Delaunay triangulation of the image provides just that.

5.4 Triangular block matching Using Delaunay triangulation for the basis of block matched motion compensation appears unique to the project as no documented experiments can be found online or in journals. The development process of the technique and prototype was therefore very experimental and the code produced is of design, rapid development, quality. This does not mean that code contains bugs or does not perform properly but rather the code has been developed to allow flexibility and robustness so that the algorithms can be changed with minimum rewrites to the code required. This does however lead to rather slow performance whilst running but it should be possible to optimise the speed of algorithms when the design of the codec is more final. Time constraints however have not allowed for much progress in the optimisation of the code. Using triangular blocks instead of the classical square blocks has many of the drawbacks noted above when discussing the fractal compressor; such as increased complexity and processing time. The results however are of interest and are rather unique compared to commonly used methods. A discussion of the processing steps of the codec is given below.

Richard Wilding April 2002 43

5.4.1 Process

The encoding process begins with triangulating the image using a split and merge process, as was performed on a still image in section 4.9. The triangulation is better suited for defining blocks than the grid block matching method above because of the arbitrary size of the blocks, which can cover large regions of movement, and additionally, the blocks are naturally aligned to object or segment boundaries; and it is the motion of objects that the codec is trying to predict. A difference map is produced by subtracting the current frame from the previous frame. The difference map is then emphasised by firstly taking the absolute difference, removing small differences (less than four grey level values) and parsing through a morphological filter to remove any stray pixels. Stray pixels are defined as having no neighbouring pixels which indicate a difference. The emphasised difference map can be seen in Figure 47.

Figure 47 – Emphasised difference map

Richard Wilding April 2002 44

The intersection of the emphasised difference map and the triangulated frame k is taken, Figure 48, this identifies which triangular blocks in the new frame contain differences to the same block in the previous frame, indicating motion.

Figure 48- Intersection of triangulation and emphasised difference map. The triangles which remain in the intersection cover areas of the frame containing motion.

A search for a block with the same dimensions is then performed over the data in the previous frame around the block’s origin to minimise the least squared error and find the best match. The search does not include inspecting the block at any rotated orientation, this was investigated initially but it was found that less than 1% of blocks matched a rotated version better than a non-rotated version. To process the rotated blocks is very computationally intensive and so was excluded from future prototype versions of the codec as a compromise and to reduced the memory required to code each motion block. Some optimisation techniques are discussed below. For each triangular block a motion vector is produced which captures the offset which matched the block nest in the search, Figure 49.

Figure 49 – Motion vectors for two motion blocks

Richard Wilding April 2002 45

Frame k-1 Frame k

Difference Map

EmphasisedDifference Map

Delaunay Splitand Merge

Triangulation

Intersection

Block matchindividualtriangles,

minimising error

Reconstructframe k from

block matches

Start

Next Frame

Figure 50 – Overview of motion prediction process

Huang and Zhnuag describe how the luminance of a block does not change between frames as it is transposed along the defining motion vector [15]. However, the majority of the motion blocks show that including a massic transform produces more suitable matches (squared error is less) and is therefore beneficial. Each motion block needs to coded with the following data members:

Massic transform * 3 18 bits Motion vector 8 bits Total 24 bits

The massic transform can be encoded in fewer bits than was needed for the fractal encoder because the matching blocks are more similar in contrast and brightness. The blocks are more closely matched in brightness and contrast because they are in close proximity to the block being matched (only a local neighbourhood is examined) and are most likely part of the same object.

Richard Wilding April 2002 46

5.4.2 Optimisations

Optimising the encoding routine is a matter of reducing the number of data comparisons that are made for each motion block. The simplest optimisation is to introduce a lower threshold limit; if the squared error of a block comparison falls beneath the threshold then the search is abandoned immediately and the relative position of the block is recorded as the motion vector. Another powerful optimisation is to reuse the processing performed on matching other blocks in the local neighbourhood; before carrying out an exhaustive search the neighbours of the triangle being matched are inspected. If the neighbouring triangles are flagged as containing motion, and a motion vector has been calculated, then that motion vector is adopted for the current triangle and tested for suitability. If the temporary motion vector produces at least as good a match as it did for the original block then the vector is kept permanently and the exhaustive search is never performed for the current triangle. If the adopted motion vector matches poorly then it is rejected and another of the neighbouring triangles is examined. If none of the neighbouring triangles produce a good match then the exhaustive search is permitted to proceed.

Richard Wilding April 2002 47

5.4.3 Initial outputs

Figure 51 and Figure 52 show the results of the triangular block matching. The algorithm does not suffer from visible blocking. The smearing artefacts that exist are more similar to those found in fractal image compressors.

Figure 51 – Output from triangulation block matching

Figure 52 – Triangular block matching of frames from the football sequence. Note the blocking boundaries are not as apparent as the square block predictor above.

Richard Wilding April 2002 48

5.4.4 Extending motion vectors

So far movement has been tracked and compensated between subsequent frames only. The motion detected between frames 10 and 11 are discarded when frames 11 and 12 are considered, the motion between these two frames is detected from scratch as if frames 10 and 11 had never been examined. Usually motion of an object occurs across more than two frames and often the motion is constant across many; imagine the case of a ball moving over a snooker table or a car moving across the screen. When the motion blocks are constructed the encoder examines each block and identifies where it most likely came from in the previous frame. The encoder generates a motion vector describing the movement of the block with both an x and a y component. To allow the detected motion to propagate through to more frames in the sequence the motion vectors are added to a pool. Before the next frame in the sequence is analysed for motion a new step is introduced; before the intersection of motion blocks with the difference map is calculated. The new stage examines each of the stored vectors in the pool and projects their movement through to the current frame. Note the codec is no longer asking ‘where did this block come from’ but ‘does this block move here’. The projection is made along the direction of the motion vector but at different scales to allow for acceleration and deceleration of the movement. See Figure 53.

Figure 53 – The right hand motion vector has been extended over another four frames. The left hand motion vector was not suitable after the initial movement and was not extended.

Richard Wilding April 2002 49

After projecting the motion through to the current frame the difference map is referenced to see if any motion exists at the projected point. If no motion is detected the next stored vector in the pool is examined until all have been processed, at which point the normal motion compensation algorithm takes over. If however motion is found at the projected point, the match of the current motion block to the location is assessed; once again the suitability of the match is evaluated by calculating the squared error. The squared error comparison is made against the block at its position in the previous frame to the block at the projected position in the current frame, it is worth noting that the comparison is actually being made against data from the frame which formed the origin of the motion which is an approximation of the data which actually existed at the block position in the frame prior to the creation of the motion vector. The normal exhaustive search is also performed for the motion block after it has been projected to the current frame. The result of the exhaustive search is compared to the match result of extending the existing motion vector. If the two results are comparable then the existing motion vector is reused, else it is removed from the pool. If the existing vector is reused the difference map is cleared for the region covered by the projected motion block so that no new motion vectors are calculated for the region. It is important to note that the difference map must be cleared rather than updated to reflect the prediction error as this would cause more motion blocks to calculated, which is against the notion of this procedure. Reused motion blocks remain in the pool for use in future frame matching. This means that some motion vectors may remain in the pool and be used in motion compensation for many frames over the sequence, until a time when the motion ceases or the match is no longer of high enough quality at which point new motion vectors will be constructed. Figure 54 shows the functional block diagram updated to include motion vector extension.

Richard Wilding April 2002 50

Frame k-1 Frame k

Difference Map

EmphasisedDifference Map

Delaunay Splitand Merge

Triangulation

Intersection

Block matchindividualtriangles,

minimising error

Reconstructframe k from

block matches

Start

Extend activemotion vectors

from cache

Updateemphasised

difference map

Update activeMotion vector

cache

Next FrameCompare toexhaustive

search

Figure 54 – Final block functional diagram

Richard Wilding April 2002 51

5.4.5 Encoder performance and comparison

Compression methods which quantise DCT or wavelet transforms often discard a lot of small details, on some occasions though the main focal point of the sequence is lost. An example of this is a video sequence of a football match; the ball which appears very small on wide angled views can disappear from the coded sequence completely for several frames. This is most likely to happen when there is also a lot of motion in the scene from the players and the camera panning. The prototype codec has shown that small items such as a ball are not lost when the scene contains both panning and other motion. The reason for this is that the triangulation picks up all areas of motion, and more importantly small details are not discarded by quantising but rather constructing from surrounding regions, so at very least a bad approximation is presented which is better than it being lost completely. See Figure 55.

Figure 55 – Tracking of small objects

Richard Wilding April 2002 52

5.4.6 Embedded fractal transform properties

Combining the motion vector with the massic transform introduces a novel and unexpected property which can be seen in Figure 56. The first frame in Figure 56 is just before a scene change, the next frame is incident with the scene change and subsequent frames continue from this point in the sequence. What is interesting is how the detail after the scene change is at first very poor but builds up over the next few frames in a similar fashion to repeating iterations of a fractal transform. This will be largely due to the massic transform which is effectively performing a fractal transformation using the data in the immediate neighbourhood as domain blocks.

Figure 56 – Iterative fractal nature of triangular compensator

Richard Wilding April 2002 53

Figure 57 is a graph showing the usage of motion blocks over a sequence of 38 frames. Note that nor error compensation has been sent to compliment the motion vectors and no key-frames (other than the very first frame) have been inserted; the iterative property noted above is used as a substitute. The graph traces the content of the stored vector pool, showing how many new vectors are added, how many are extended and how many are removed over the sequence. With extended motion vector support disabled the compression ratio of the sequence was 30:1, when the extended motion vectors are permitted the compression ratio increases to 59:1.

0

100

200

300

400

500

600

700

800

1 4 7 10 13 16 19 22 25 28 31 34 37

Frame Number

Inst

an

ceC

ou

nt

Kept

Trashed

New

Figure 57 – Extended motion block usage across frame sequence

Richard Wilding April 2002 54

The graph in Figure 58 plots how many motion vectors are reused over a 300 frame sequence. The very distinct dips in the plot where virtually no motion vectors are reused correspond to the three scene change in the sequence. The lull near the middle of the sequence arises a point in the sequence where a new large object enters the scene. The blocks used in the scene changes are not actually tracking any motion but are constructing the new frame by applying a massic transform to data in the neighbourhood of each block. The sequence is compressed to a 81:1 ratio.

Figure 58 – Plot of reused motion vectors over 300 frame sequence.

5.4.7 Saving motion vectors

The present implementation stores the motion vectors into a file on the computers hard drive as they are removed from the stored domain pool. The motion vectors cannot be written to the file earlier because the complete extension of each is not known until it is no longer used. The consequence of this is that the motion vectors will be stored out of sequence and out of synch with the saved triangulation bit-stream. A second pass of the output file will be required after encoding is complete to reorder the data. Not enough time was available to complete this stage so the prototype produces data that is out of order, all data is stored however so the compression ratios are correct.

0

500

1000

1500

2000

2500

1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289

Frame Number

Inst

ance

Co

un

t

0

500

1000

1500

2000

2500

1 17 33 49 65 81 97 113 129 145 161 177 193 209 225 241 257 273 289

Frame Number

Inst

ance

Co

un

t

Richard Wilding April 2002 55

5.5 Results Figure 59 and Figure 61 show a comparison of how different codecs perform on two test sequences. Comparisons are made between four commercial or publicly available codecs and the final prototype produced for the project. Evaluation of the encoding is by mean opinion score.

In Figure 59 all codecs compress to a similar level. DivX 4 compresses the least which is surprising as it is commonly used to compress long movies and is know to perform well. The poor performance of the DivX 4 codec, and the others, may be attributed to the length of the sequence being quite short. Certainly in the case of the project prototype a longer sequence would have provided more opportunity to increase motion vector extension and increase the compression factor. The cinepack codec was least favourable, it appears very blocky and with large distortions to object detail (Figure 60).

Codec File Size (KB) Compression Ratio % MOS Project Method 470 14.25 4 DivX 488 13.73 5 Cinepack 619 10.82 2 Indeo 5.1 592 11.31 3 DivX 4 313 21.40 5

Figure 59 – Compression comparison to 70 frame ‘politician’ sequence containing camera panning and a moderate amount of object motion.

Figure 60 – Still from Cinepack coded sequence

Richard Wilding April 2002 56

The project prototype achieved the best compression ratio for the second sequence, Figure 61. Note that although the Cinepack codec achieved a good score (4) the compression ratio is very poor compared with the other codecs. DivX and DivX 4 obtained a higher score than the prototype which produced a few anomalies in the coded scene such as a slight flickering in some of the triangles. The general consensus was that the DivX and DivX 4 codecs produced a sharper image. None of the subjects who performed the scoring noticed the iterative effect on scene changes (Figure 56), probably because the distortions are minimised within a few frames and during this time the eye is familiarising itself with the new scene rather than the details within it.

Codec File Size (KB) Compression Ratio % MOS Project Method 539 81.00 4 DivX 658 66.33 5 Cinepack 4416 9.88 4 Indeo 5.1 956 45.65 3 DivX 4 711 61.39 5

Figure 61 - Compression comparison to 300 frame ‘hallway’ sequence containing several scene changes.

Richard Wilding April 2002 57

A third test sequence (Figure 62) also raises some interesting points about human perception and what is important when looking at movement. The sequence runs over 400 frames and shows a fight sequence in a training dojo. The camera pans, zooms and cuts erratically leaving little opportunity to extend motion vectors over more than few frames. The camera’s viewpoint changes dramatically every few seconds, which for the encoder, is as good as a scene change.

Figure 62 – Third test sequence

Figure 63 – Selected frames from the coded sequence

Figure 63 shows selected frames from the coded sequence. The two left frames exhibit a lot of triangular blocking, they both occur on a scene change. The two right images are each a few frames later, notice the considerable amount of detail that has been reconstructed due to the embedded fractal like behaviour. When the sequence is viewed back at normal speed the blockiness is not noticeable, it is not until the sequence is slowed down, or is played several times over, that a subject notices the triangular blockiness and smearing. The brain appears to concentrate on what and how things are moving rather than what is happening within them (the details), the triangular block matching has facilitated a very accurate reconstruction of the movement which is perhaps all the brain has time to process and analyse. Traditional block matching forces the motion into square blocks which don’t follow the irregular movement of real life objects. Furthermore the eye is more sensitive to some orientations than others, the brain will pick out the harsh horizontal and vertical distortions much more easily than it would an angled one.

Richard Wilding April 2002 58

6 Conclusions The project has been successful in that the design briefs have been met; still frame fractal compression and video coding has been researched, a number of working prototypes have been produced and comparisons have been made to existing technologies.

6.1 Discussion Initially a lot of the energy for the project was focused on still frame compression, which from research, was shown to be a very important aspect to video coding. Several prototype fractal encoders were developed with increasing complexity to evaluate the methods which were identified as research progressed. When a technique had been proven it was either developed further (such as the case for different partitioning methods) or it was accepted as plausible but not implemented in future prototypes to aid development rate (such as encoding colour fractal blocks). With the implementation and success of integrating the Delaunay partitioning with the grey scale fractal codec further work on still image compression was stopped. Although the quality was not as good as other methods (such as jpeg) the project was not focused on image compression and it was important not to let development drift that way. It was expected that implementing a Delaunay triangulation algorithm would not be too time consuming, in reality however the papers and the designs therein where not instantly applicable to focusing on an entire triangles. Considerable effort was exerted on producing a custom Delaunay algorithm and library that could be easily integrated in the still frame compressor and then extended for use in motion compensation. Techniques learnt from reading the initial papers aided the development of the Delaunay triangulation and further research into the properties of triangles themselves led to a very complete, robust and tailored solution. As mentioned in the main body of the report, triangle based motion prediction has not been documented, at least the research for the project could not uncover any. Triangles seem an obvious choice to describe motion (a triangle is the simplest geometric area, from which any other can be constructed) because motion can occur over any angle and is not restricted to harsh horizontal and vertical constraints (such as the imposed by square block matching).

Richard Wilding April 2002 59

The results obtained, especially with the inclusion of extended motion vector routines, are very promising. Compression ratio’s are comparable on the test sequences to codecs in use today, and only after a few months development. The codec tracks motion well and can generate long sequences of high quality frames. Motion of small objects are tracked, and the codec is very robust to large changes in the scene or even complete scene changes without the need for intra-frames. The drawback of the method is the encoding speed which average between 2-5 frames per minute; the 300 frame test sequence took over two hours to encode. However the implementation is of prototype quality, optimisation of the code and the technique itself will lead to more useable encoding times. Indeed, if Moore’s law continues to hold true, in a few years time computers will have progressed to make the current implantation usable. Unfortunately there was not enough time to complete the project, and a number of important areas remain unfinished.

1. Integrating the colour coding into the triangle fractal compressor and the motion compensator.

2. Implement remaining global motion transforms (scale and rotate) 3. Perform second pass of video coded data 4. Decoding a motion sequence from an encoded bit stream on disk. 5. Real time viewer of encoded stream

The code developed for the project has been through a series of rapid-prototyping and so no formal class, data or functional design diagrams have been produced. Code design is important to produce quality engineering solutions and must not be overlooked in the final implementation of this project.

Richard Wilding April 2002 60

6.2 Applications The applications of video are very vast; a low bit rate encoder, which the prototype may develop into, has application on handheld devices such as mobile phones, pda’s and heads-up displays. This prototype in particular is suited to perhaps streaming video; if the data channel becomes clogged the bit rate can be dropped without losing the movement of the video, perhaps just the details; as was seen earlier.

6.3 Future research Optimisation of the encoding process should be an important ongoing concern during any further development but should not be placed higher than developing other areas of the project. It would be sensible to evaluate the need for intra-frames in the video coder before much more development on the still image compressor. The video encoder handles scene changes well itself and does not suffer badly from propagated matching error because it generates new content; there may by no benefit in including intra-frames at all in this encoder. A hardware implementation of the encoder would be an interesting area to research as well as developing hardware/software deployment solutions such as handheld or personal viewers. It is not expected that the stream (particularly the partitioning bit-stream) will be tolerable to noise and any errors may well be catastrophic to the decoder, especially if no self synching mechanism like intra-frame are incorporated. A method of streaming the coded data and making it tolerable to errors would also be an interesting area of research.

Richard Wilding April 2002 61

[1] Fractal Image compression through iterated function systems, Pulcini, Verrando, Rossi and Meloni. - La Sapienza University, Rome. 1995

[2] Fractal Image Compression, Yuval Fisher – San Department of Mathematics, Technion

Israel Institute of Technology. 1992 [3] Fractal Image Compression Based on Delaunay Triangulation and Vector Quantization,

Davoine, Antonini, Chassery, Barluad – IEEE Transactions ion image processing, vol 5, no 2, pages 338-346. February 1996

[4] Adaptive Delaunay Triangulation for Attractor Image Coding, Davoine and Chassery –

Laboratoire TIMX-IMAG. No Date [5] Primitives for the Manipulation of General Subdivisions and the Computation of Voronoi

Diagrams, Guibas and Stolfi – ACM Transactions on Graphics , Vol 4, No 2, Pages 74-123. April 1985

[6] On the conversion of ordinary Voronoi diagrams into Laguerre diagrams, Anton and Mioc

– Department of Computer Science. No Date [7] Real-Time implementation of Fractal Image Encoder, Rejeb and Anheier – Institute for

Electromagnetic Theory and Microelectronics, University of Brenen. No Date [8] Motion Compensation Using Adaptive Rectangular Partitions, Dubuisson and Davione –

Univerite de Technologie de Compiegne. 1999 [9] Fractal block coding of Digital Video, Lazar and Bruton – IEEE Transactions on circuits

and systems for video technology, vol4, no 3. June 1994 [10] Optimal Hierarchical Partitions for Fractal Image Compression - IEEE International Conference on Image Processing (ICIP’98), Chicago, Oct 1998 [11] Video Compression Technology – Sony Broadcast & professional Europe. No Date [12] Speeding up fractal image compression, Beham Bani-Eqbal - Department of Computer Science, University of Manchester. 1994 [13] Fractal Image Compression. An introductory overview, Saupe, Hamzaoui, Hartenstein –

No Date. [14] Dr Math Forum, http://mam2000.mathforum.org/dr.ath/problems/nobel.6.99.html [15] An Adaptively Refined Block Matching Algorithm for Motion Compensated Video Coding,

Huang and Zhuang – IEEE Transaction on circuits and systems for video technology, Vol 5, No 1, February 1995

[16] Delaunay Triangulations, Glenn Eguchi. October 2001

Recommended