Upload
ahmad-abdelhafeez
View
390
Download
1
Embed Size (px)
Citation preview
Fast Block Motion Estimation With 8-Bit Partial Sums Using
SIMD Architectures
Presented by: •Ahmed Abdel-Hafeez•Ahmed El-Bohy•Ahmed Emam•Ahmed Kandil
Supervised by/Presented to: Pf.Dr. Attalah Hashaad
Published by: Chunjiang J. Duanmu et. al. Published in August 2007.
2
Outline• Abstract.• Introduction.• 8-bit partial sums.• Multilevel 8-bit partial sums.• Computational complexity.• Simulation Results.• Conclusion.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
3
Abstract• Fast block motion estimation algorithms are needed for real-time
implementations of video coding standards due to the high computational complexity of the full-search algorithm for block motion estimation.
• In this paper, an algorithm using 8-bit partial sums of 16 luminance values for a fast block motion estimation is proposed. The technique of using the partial sums is employed to reduce the computational complexity of not only the full search algorithm but also some of the fast block motion estimation algorithms while maintaining their accuracy.
• Furthermore, it is shown that the byte-type data-parallelism on an SIMD architecture can be utilized to access and process these partial sums concurrently to accelerate the process of motion estimation.
• Simulation results are presented to demonstrate that the use of the partial sums can accelerate the execution of the full-search and another search algorithms on an SIMD architecture significantly.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
4
Introduction- - Applications
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Basics
5
Chronological Table of Video Coding StandardsThe objective of video coding is to compress moving images
H.261
(1990)
MPEG-1
(1993)
H.263
(1995/96)
H.263+
(1997/98)
H.263++
(2000)
H.264
( MPEG-4
Part 10 )
(2002)MPEG-4 v1
(1998/99)MPEG-4 v2
(1999/00)MPEG-4 v3
(2001)
1990 1992 1994 1996 1998 2000 2002 2003
MPEG-2
(H.262)
(1994/95)ISO/IEC
MPEG
ITU-TVCEG
6
Introduction-Basics- VideoFrame 1 Frame 2 Frame 3 Frame 4
Luminance (Y) : Describes the brightness of the pixel.
Chrominance (CbCr) : Describes the color of the pixel.
Frame
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
7
Introduction-Basics- Video Data Drawback
• An uncompressed video data is big in size.– This is due to data redundancy, there are two
general types of data redundancy in a video:
Spatial redundancy
In a frame, adjacent pixels are usually correlated. e.g. - The grass is green in the background of a frame.
Frame 1 Frame 2 Frame 3 Frame 4
Time based redundancy
In a video, adjacent frames are usually correlated. e.g. - The green background is persisting frame after frame.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
8
• Predict current frame based on previously coded frames
• Types of coded frames:– I-frame – Intra-coded frame, coded independently of all
other frames– P-frame – Predictively coded frame, coded based on
previously coded frame– B-frame – Bi-directionally predicted frame, coded based on
both previous and future coded frames
Introduction-Basics- Video Compression
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
9
Block Matching
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
10
• What is Motion Estimation?– Predict current frame from previous
frame– Determine the displacement of an object
in the video sequence– The amount of data to be coded can be
reduced significantly if the previous frame is subtracted from the current frame.
Motion Estimation
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
11
Block Based Motion Estimation Algorithms
Time-domain Algorithms Frequency-domain Algorithms
Matching Algorithms Gradient Based Algorithms
Block-MatchingFeature-matching
Pel-recursive Block-recursive Phase-correlation (DFT)
Matching in (DCT) domain
Matching in wavelet domain
Mesh Based Motion Estimation Algorithms
Motion Estimation Classification
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
12
Motion Estimation (ctd)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
13
Motion Estimation (ctd)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
14
Motion Estimation (ctd)
Reference Frame
Current Frame
Current 16x16 Block
Mot
ion
Vecto
r
Search Window
Sum of Absolute Difference (SAD)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
15
• CCF(Cross-Correlation Function)
• MSE(Mean Square Error Function)
• MAE(Mean Absolute Error)
• SAD(Sum of Absolute Difference)
• PDC(Pixel Difference Classification)
• MAE(or MAD,SAD are commonly employed due to their simplicity in hardware implementation)
Distortion Criterion for measuring distance between previous block and search area block
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
16
SAD(dx,dy) =
(MVx, MVy) = min (dx,dy)ЄR2 SAD(dx,dy)
1 1
1 |),(),(|Nx
xm
Ny
ynkk dyndxmInmI
SAD
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
17
Search Algorithms
Search Algorithms
FAST
MULTISTEP
3SS 4SS HBS UDS
EXHAUSTIVE
SE MSE VF PFGSE
FULL
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
18
Search Algorithms (ctd)
• There is a trade-off between the run time and the accuracy.
• Full search will be most accurate because of exhaustive search, but will require more time
• Fast search is faster but the accuracy will be reduced because of estimation algorithms.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
19
Full-Search
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
not suitable for real time.
20
•Simplest algorithm, but computationally most expensive
Exhaustive Search
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
21
Three Step Search (3SSA)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
22
Three Step Search (3SSA) (ctd)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
23
Three Step Search (3SSA) (ctd)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
24
Three Step Search (3SSA) (ctd)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
25
3SSA Block Matching
► Three-Step Search (3SS)– 9 Points: Central point & its 8
surroundings– Distance: w/2– Find the best match– Use previous best as center– Half distance, select 8 new– Repeat algorithm 3 times– Examines 25 points– Assumes a uniform
distribution of MV’s
1
1
11
11
1 1
1
23
2
2
222
2
2333 3 3
33
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
26
4SSA
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
27
Unrestricted center-bitiased Diamond Search Algorithm (UDSA)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
28
Hexagon-Bitased search algorithm (HBSA)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
29
Problem Definition
• The high computational requirement of the Full Search (FS) algorithm does not allow it to work in real time applications, despite its high accuracy.
• Fast Block motion estimation algorithms have lower computational complexity, but lower accuracy.
• Since, fast block motion estimation are chosen for real time applications Hence in this paper too.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
30
Aim
• To improve the accuracy of some of the fast block motion estimation techniques without increasing the computational complexity.
• To make best use of Single Instruction Multiple Data (SIMD) architecture and to take advantage of byte-type data-parallelism to further accelerate the execution of the algorithms to achieve the main goal.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
31
Limitation
• If the partial sums for an algorithm is more than 8 bits for a reference block cannot be put, accessed, and manipulated in a contiguous memory space, since there are partial sums of other reference blocks lying in between; due to this, a large number of CPU cycles are lost in manipulating these data. As a consequence, these algorithms are not suitable for SIMD implementations.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
32
Procedure
• Devise a scheme that uses only 8 bit partial sum and discard as many SAD computations as possible, without excluding the optimal motion vector.– The proposed partial sums can not only be utilized
in the full-search algorithm as well as in some of the fast block motion-estimation algorithms.
• Devise a scheme that generalises the previous scheme to multi-level case and optimally utilise it.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
33
Partial Sums
268+ 483
600Add the hundreds (200 + 400)
Add the tens (60 +80) 140Add the ones (8 + 3)
Add the partial sums(600 + 140 + 11)
+ 11751
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
34
8 Bit Partial Sums- Objective
• The objective of this paper is to find new partial sums of only eight bits, so that they can be of the packed byte-type on an SIMD architecture.
• In this way, eight additions or subtractions, for the partial sums can be executed in one SIMD instruction
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
35
8-bit Partial Sums 0123456789101112131415
16 X 16
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide∑(n)
36
Lower Bound
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
using
37
Scheme One- Algorithm
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• Step 1) Initialization a) Compute all of the 8-bit partial sums of
sixteen luminance values for the current frame and save them in a contiguous memory space.
b) Retrieve all the 8-bit partial sums of sixteen luminance values for the reference frame in a saved contiguous memory
38
Scheme One- Algorithm (ctd)
• Step 2) For every current block, execute the block motion-estimation process. – Step 2.1) Initialization
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
39
Scheme One- Algorithm (ctd)
– Step 2.2) Search • For (each search location of in a motion-
estimation algorithm)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
40
Scheme One- Flow Chart
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
41
Multilevel 8-bit Partial Sums
16 X 16
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Multi-level Visualisation
Multi-level Visualisation
Multi-level Visualisation (ctd)
Multi-level Visualisation (ctd)
Multi-level Visualisation (ctd)
Multi-level Visualisation (ctd)
Multi-level Visualisation (ctd
49
Partial Sum Pyramid
Partial Sum Pyramid
8 x 16
4 x 16
2 x 16
1 x 16
Level 1 Level 2 Level 3 Level 4ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
50
Multilevel 8-bit Partial Sums- Upper Bound (UB)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
.
51
Scheme Two Algorithm
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• Step 1) Initialization a) Compute all of the 8-bit partial sums of levels
one and four for the current frame and save them in a contiguous memory space.
b) Retrieve all of the 8-bit partial sums of levels one and four for the reference frame in a saved contiguous memory space.
52
Scheme Two Algorithm (ctd)
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
• Step 2) For every current block, execute the block motion-estimation process. – Step 2.1) Initialization
53
Scheme Two Algorithm (ctd)– Step 2.2) Search
• For (each search location of in a motion-estimation algorithm)
54
Scheme Two- Flow Chart
55
Possible Conditions
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
Condition 1:
Condition 2:
Condition 3:
Condition 4:
56
Possible Combinations
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
AVERAGEEXECUTION TIME(INMILLISECONDS)PERFRAME FORVARIOUSMETHODS
Results
57ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
58
Possible Combinations
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
59
SIMD
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
60
COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING FSA
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
61
COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING SEA
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
62
COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING 3SSA
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
63
COMPUTATIONAL COMPLEXITY ANDAVERAG ENUMBER OF CPU CYCLES PER BLOCK USING 4SSA
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
64
COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING UDSA
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
65
COMPUTATIONAL COMPLEXITY AND AVERAGE NUMBER OF CPU CYCLES PER BLOCK USING HBSA
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
66
THE PERCENTAGE OF SPEEDUP OFFERED BY SIMD IMPLEMENTATION FOR A MOTION ESTIMATION ALGORITHM WITH SCHEME 2 INCORPORATED
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
67
Conclusion
Introduced a new technique of 8 bit partial sum.
The partial sums were used to make best use of SIMD architecture, and hence improving the speed of motion estimation algorithm.
Since these partial sums have the characteristic of having only 8 bits, eight of them can be processed concurrently using a single 64-bit SIMD register.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
68
Conclusion The notion of the 8-bit partial sums has then been
extended to the four-level case and shown that there are 15 possible methods of utilizing these multilevel partial sums to accelerate the block motion-estimation algorithms without any loss of accuracy.
The full-search algorithm has then been used to determine as to which one of these 15 methods would provide the lowest computational complexity in order for it to be chosen to accelerate various motion-estimation algorithms.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
69
Conclusion
Extensive simulations have been carried out to find the average number of CPU cycles needed per block for various algorithms incorporating the chosen method.
These simulations have shown that the proposed scheme is capable of providing a substantial speed-up for the various existing motion-estimation algorithms through the reduction of their computational complexities.
The simulation results also demonstrate that the implementation on an SIMD architecture can further accelerate the proposed scheme by more than 93%.
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
70
References1. “FPGA Implementation of a Novel, Fast Motion Estimation Algorithm for Real-Time Video
Compression”, FPGA 2001, CA. USA, S. Ramachandran and S. Srinivasan, Feb. 20012. “Image & Video Compression for Multimedia Engineering”, Y.Q. Shi and H. Sun, 20003. “A New Diamond Search Algorithm for Fast Block-Matching Motion Estimation”, IEEE Trans. Image
Processing, S. Zhu and K. K. Ma, Feb. 20004. “A Novel Four-Step Search Algorithm for Fast Block Motion Estimation”, IEEE Trans. Circuits System,
Video Technology, L. M. Po and W. C. Ma, June 19965. “Successive Elimination Algorithm for Motion Estimation” W. Li and E. Salari IEEE Trans. , Jan. 19956. “A New Three-Step Search Algorithm for Block Motion Estimation”, IEEE Trans. Circuits System,
Video Technology, R. Li, B. Zeng, and M.L. Liou, Aug. 19947. “Predictive Coding Based on Efficient Motion Estimation”, IEEE Trans. on communications, R.
Srinivasan, K.R. Rao, Aug. 19858. “Motion Compensated Inter-Frame Coding for Video-Conferencing”, T. Koga, K. Iinuma, A. Hirano, Y.
Iijima, and T. Ishiguro, Proc. NTC81, Nov. 19819. “Displacement Measurement and its Applications”, IEEE Trans. on communications, J.R. Jain and
A.K Jain, Dec. 1981
ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide
71ARAB ACADEMY-CAIRO Fast Block Motion Estimation With 8-Bit Partial Sums Using SIMD Architectures spring 2013 slide