DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML Fast super-resolution of video sequences using sparse directional transforms* Sandeep Kanumuri

DoCoMo USA Labs All Rights ReservedSandeep Kanumuri, NML

Fast super-resolution of video sequences using sparse directional transforms*

Sandeep KanumuriOnur G. Guleryuz

DoCoMo USA Labs

*Presented at 2008 SIAM Conference on Imaging Science on 07/09/2008

(Animated slides, please use slide show mode)

2DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML

Outline

• System Model

• Motivation

• Prior Work

• Our Solution: SWAT (Sparse Warped transform and Adaptive Thresholding)– Algorithm Flowchart

– Over-complete Transform

– Warped (Directional) Transform

– Over-complete Inverse Transform

– Adaptive Thresholding

• Performance Comparison

• Conclusion


System Model

• Design goals1. High Quality Rendering

2. Fast Algorithm (Lower Complexity) – Single Frame, Simple Transform

DoCoMo USA Labs All Rights ReservedSandeep Kanumuri, NML

Motivation


Broadcast Video – TV application

Docking station

Low-resolution video signal for mobile phones

Low-resolution video is sent to the docking station

Docking station uses the SWAT algorithm to convert low-resolution video to high-resolution video

High-resolution video is sent to a TV or a large display

BENEFIT: Broadcast programming aimed at mobile phones can also be

used in stationary environments

A.1

A.2

B

Low-resolution video is converted to high-resolution video by the cell phone itself

using the SWAT algorithm and high-resolution video is transmitted to the TV

using local wireless technologies

Only one path (Path A or Path B) is used


Broadcast Video – VGA phones

Low-resolution video signal for mobile phones

BENEFIT: SWAT capability allows this cell phone to convert low-resolution

video to high-resolution video

VGA phone with SWAT capability

VGA phone without SWAT capability


More Applications…

• Video Quality Enhancement Service– SWAT algorithm can be deployed as a service to enhance the

resolution and quality of videos

• Video Conferencing– A SWAT equipped terminal can show video at a higher zoom level

and with improved quality

• High-quality Image Zooming– SWAT algorithm enables the mobile phone to convert the low quality,

low resolution image into a high quality, high resolution image


Prior Work

• Linear solutions– Filter design

• Non-linear solutions– Regularization (Projection onto the model space)

• Signal Sparsity– Iterated Denoising / Shrinkage– Lp-Norm Minimization

• Optical Flow

• Adaptive filtering

• Example-based approaches

– Data Consistency (Projection onto the input space)


SWAT Algorithm Flowchart

Output Image/Video

Input Image/Video

Linear Interpolation Filter

Directional Over-completeTransform

Adaptive Thresholding

Directional Over-complete Inverse Transform

Enforce Data Consistency

More iterations?

Low-resolution, low quality

High-resolution, low quality

High-resolution, high quality

yes no

Regularization


Linear Interpolation Filter

• A linear interpolation filter is used to form an initial estimate of the high-resolution image/video– However, the quality of interpolation is relatively low

• Popular filter choice– Low pass filter of Daubechies 7/9 Inverse Wavelet

– H.264 Interpolation Filter

• A customized linear interpolation filter can be used, if any of the following is known.– Downsampling filter (if the input was obtained by downsampling a

higher resolution original)

– Filtering caused by the camera acquisition process


0 N-1k

(Sparse Decomposition Domain)(Signal Domain)

S(k)

+T

-T

0 N-1n

s(n)

0 N-1k

C(k)^

(Denoised)

Core idea – Exploit Signal Sparsity

S(k)

0 N-1k

+ W(k)C(k) =

“noise”


• Transform size: 4x4 (used for description), 3x3

• Transform used: DCT, Hadamard• For an Over-complete Transform

– all possible 4x4 blocks in the image/frame are selected using a non-directional mask

– Each 4x4 block undergoes a transform to produce a set of transformed coefficients

– Each pixel is involved in multiple transforms (16, on the average)

– Total number of transformed coefficients ~ 16 x number of pixels

• Directional Over-complete Transform– Here, each of the 4x4 blocks is formed

by applying a directional mask followed by a warping process (see next slide)

Block (1,1)

Block (2,1)

Block (H-3,1) Block (H-3,2) Block (H-3,W-3)

Block (1,2) Block (1,W-3)

Block (2,2) Block (2,W-3)

…

…

…

… … …

Blocks of an Over-complete Transform

H = Height of image; W = Width of image

Non-directional mask used to select a 4x4 block

Over-complete Transform


but violated on directional edges

Signal sparsity in DCT domain holds for horizontal

and veritcal edges

Non-directional mask

Directional masks

Transform domain: 4x4 DCT

Transform support is warped

Animated Slide, Please use slide show mode

Let us consider 4 blocks along the edge- First, using Non-directional masks- Now, using Directional masks- Directional masks lead to sparse representation

For Directional Over-complete Transform, Directional masks replace the Non-directional mask

Warped (Directional) Transform


• Decision made for a block (4x4) of pixels– At each pixel, a vote is cast for the mask that minimizes the signal

variance along the mask direction.

– The mask with the most votes is chosen

• Reduces inconsistency in directions

How to choose a mask?

Example masks


Over-complete Inverse Transform

• For an Over-complete Inverse Transform– Each set of transformed coefficients is converted back to pixel domain

– Each pixel has multiple estimates from different blocks and a weighted combination is used to arrive at its final estimate

W1 W2 W3

and so on with all the blocks….


Adaptive Thresholding

• Transform coefficients are thresholded for denoising

• A master threshold ( ) is used for an initial pass

• A local threshold ( ) is calculated and finally used– Elost: Energy lost due to thresholding when is used as threshold.

TEfT lost ˆ

• Parameters f1 to fn and E1 to En are tuned to achieved a local optimum

1

f1

f2

fn

(0,0) E2E1 En

Elost

f()

T

T

T̂


Enforcing Data Consistency

• Role of data consistency module – Ensure that the high-resolution estimate, when downsampled, can

produce the low-resolution input.

Data Consistency module

Downsampling FilterLinear Interpolation

FilterHigh-resolution Input

Low-resolution Input

High-resolution Output

+

+

_

+


Performance Comparison

• Super-resolution of QCIF to CIF sequences– Low pass filter from Daubechies 7/9 wavelet filter bank

– Compression is done using H.264/AVC codec (JM12.0)

• SWAT run with 2 iterations

• Compared with– Bilinear interpolation

– H.264 interpolation

– Simple Inverse

– Iterated Denoising / Shrinkage (ID)• 2 iterations (similar complexity compared to SWAT)

• 10 iterations


PSNR comparison (uncompressed)






H264ID (2 iterations)SWAT

Visual Comparison (uncompressed)


H264ID (2 iterations)SWAT

Visual Comparison (uncompressed)


PSNR comparison (compression at QP=20)


PSNR comparison (compression at QP=25)


H264SWAT

Visual Comparison (compression at QP=25)


Visual Comparison (compression at QP=25)

H264SWAT


Conclusion

• SWAT algorithm renders high quality output and yet remains fast– Quality comparable to ID (10 iterations)– Complexity comparable to ID (2 iterations)

• Enabling Features– Over-complete transform representation– Simple basic transform (Hadamard, Integer DCT)– Sparse warped transform– Adaptive thresholding– Weighted inverse transform

• Reference– S. Kanumuri, O. G. Guleryuz and M. R. Civanlar, "Fast super-resolution

reconstructions of mobile video using warped transforms and adaptive thresholding", SPIE Applications of Digital Image Processing XXX , August 2007

• Flicker Reduction Application– To appear in SPIE 2008 (Applications of Digital Image Processing XXXI)

• E-mail:– Sandeep Kanumuri ([email protected])– Onur G. Guleryuz ([email protected])

Documents

DoCoMo USA Labs All Rights Reserved Sandeep Kanumuri, NML Fast super-resolution of video sequences using sparse directional transforms* Sandeep Kanumuri