Transcript
Page 1: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

High Efficient Distributed Video Coding with

Parallelized Design for Cloud Computing

適用於雲端架構下兼具高效能與平行化設計之分散式視訊編碼

CMLab, CSIE, NTU1

Cheng, Han-Ping 程瀚平 Advisor: Prof. Wu, Ja-Ling 吳家麟 教授

2010/6/2

Page 2: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Outline

Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work

CMLab, CSIE, NTU2

Page 3: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Trends of Cloud Computing

Cloud Computing makes Clients slimmer&thinner

CMLab, CSIE, NTU3

Page 4: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Video Coding in Cloud Computing

Only need low complexity encoder and decoder at client side Conventional video coding (e.g. H.264)

Encode once, decode many times Low complexity decoder

Distributed Video Coding (DVC) e.g. Video surveillance, wireless sensor

network Low complexity encoder

CMLab, CSIE, NTU4

Page 5: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Distributed Video Coding

Slepian-Wolf Theorem (1973)

Wyner-Ziv Theorem (1976)

CMLab, CSIE, NTU5

RX ≧H(X)Source X

Source Y

Dependency exists but is not exploited

Joint Decoder

X

Y

Encoder X

Encoder YRY ≧H(X)

RX + RY≧?RX + RY≧H(X, Y)

Source X

Source Y

Statistical dependency

Joint Encoder

RX ≧H(X)

Joint Decoder

X

Y

Conventional video coding paradigmRY ≧H(Y)

Slepian&Wolf : H(X, Y) !!

Page 6: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Distributed Video Coding

Wyner-Ziv Theorem (1976) Extend to lossy coding

CMLab, CSIE, NTU6

Dependency exists but is not exploited

Joint DecoderEncoder X

Source XSource

Encoder

XSource Decoder

Virtual channelVirtual channel

Encoder Y

Source YY

Source Encoder

Source Decoder

Side information estimation

X’

DVC is also called Wyner-Ziv (WZ) video coding

Quantizer

Quantizer

Channel Encoder

Channel Decoder

Y

Channel Encoder

Channel Decoder

Noisy Channel

X’XX+P (X+P)’

Channel coding (Error Control Code):

RX + RY≧ ?

Wyner&Ziv : H(X, Y) !

RY ≧H(Y)

RX ≧H(X|Y)

Correlation is exploited

P

Page 7: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Video Coding in Cloud Computing

WZ to H.264 video transcoder

CMLab, CSIE, NTU7

WZ to H.264 Transcoder

CloudComputational Resource

WZ encoder(Low Complexity)

H.264 decoder(Low Complexity)

WZ encoded bitstream

H.264 encoded bitstream

Page 8: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Motivation

There is still a gap between Wyner-Ziv video coding and conventional video coding (e.g. H.264/AVC)

Most reported WZ codecs have a high time-delay in the decoder Trends of parallel computing

e.g. Multi-core CPU, GPU Parallelizability of the decoder is essential

CMLab, CSIE, NTU8

Page 9: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISPAC Video Codec

DIStributed video coding with PArallelized design for Cloud computing (DISPAC) To better rate-distortion (RD)

performance Combine coding tools developed in recent

literatures with some newly developed modules.

To reduce decoding time-delay Highly parallelized decoder.

CMLab, CSIE, NTU9

Page 10: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Outline

Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work

CMLab, CSIE, NTU10

Page 11: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISPAC Video Codec

Combine coding tools of two state-of-the-art WZ codec: DISCOVER codec (Distributed coding for video

services) X. Artigas et al., “The DISCOVER codec: architecture,

techniques and evaluation”, PCS, 2007

MLWZ codec (Motion-learning based Wyner-Ziv video coding)

R. Martin et al., “Statistical motion learning for improved transform domain Wyner-Ziv video coding”, IET Image Processing, 2010

CMLab, CSIE, NTU11

Page 12: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISCOVER Video Codec

CMLab, CSIE, NTU12

Ref. X. Artigas et al., PCS, 2007

GOP 2

WZKey WZKey Key

GOP 4

WZ

Page 13: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Quantization

CMLab, CSIE, NTU13

Eight quantization matrices

Q1

16 8 0 0

8 0 0 0

0 0 0 0

0 0 0 0

Q2

32 8 0 0

8 0 0 0

0 0 0 0

0 0 0 0

Q3

32 8 4 0

8 4 0 0

4 0 0 0

0 0 0 0

Q4

32 16 8 4

16 8 4 0

8 4 0 0

4 0 0 0

Q5

32 16 8 4

16 8 4 4

8 4 4 0

4 4 0 0

Q6

64 16 8 8

16 8 8 4

8 8 4 4

8 4 4 0

Q7

64 32 16 8

32 16 8 4

16 8 4 4

8 4 4 0

Q8

128 64 32 16

64 32 16 8

32 16 8 4

16 8 4 0

32 = 25

=> use 5 bits

8 = 23

=> use 3 bits

0 bits (不傳送 )

Page 14: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Quantization

CMLab, CSIE, NTU14

DCT coefficient band

Block1

S11 S1

2 S16 S1

7

S13 S1

5 S18 S1

13

S14 S1

9 S112 S1

14

S110 S1

11 S115 S1

16

Block2

S21 S2

2 S26 S2

7

S23 S2

5 S28 S2

13

S24 S2

9 S212 S2

14

S210 S2

11 S215 S2

16

Block3

S31 S3

2 S36 S3

7

S33 S3

5 S38 S3

13

S34 S3

9 S312 S3

14

S310 S3

11 S315 S3

16

DCT coefficient band b1: { S11, S2

1, S31, …SN

1 }

DCT coefficient band b2: { S12, S2

2, S32, …SN

2 }

DCT coefficient band b16: { S116, S2

16, S316, …SN

16 }

DC band

AC bands

Page 15: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Bit plane Extraction

CMLab, CSIE, NTU15

00100 00001

00000 11110

Bit planes of DC band:

Bit plane 1:

Bit plane 2:

Bit plane 3:

Bit plane 4:

Bit plane 5:

Channel Encode(LDPCA)

4 6

7

0 6

3

1 7

7

30 1

5

For each DCT coefficient band…

MSB

LSB

Q4

32 16 8 4

16 8 4 0

8 4 0 0

4 0 0 0

Page 16: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISCOVER Video Codec

CMLab, CSIE, NTU16

Ref. X. Artigas et al., PCS, 2007

白育姍

Dependency exists but is not exploited

Joint DecoderEncoder X

Source XX

Virtual channelVirtual channel

Encoder Y

Source YY

Source Encoder

Source Decoder

Side information estimation

X’

Quantizer

Quantizer

Channel Encoder

Channel Decoder

Y

RY ≧H(Y)

RX ≧H(X|Y)

P

Page 17: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Side Information Creation

CMLab, CSIE, NTU17

XFXB

Low pass filter (3x3 Mean filter)Divide frame to 16x16 non-overlapped blocksMotion estimation (search window: ±32)

( , )

1( , ) ( , ) ( , )x y F B x y

x y B

MAD d d X x y X x d y dN

2 2( , ) ( , ) (1 )x y x y x yCF d d MAD d d K d d

Page 18: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Side Information Creation

CMLab, CSIE, NTU18

XFXB

Page 19: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Side Information Creation

CMLab, CSIE, NTU19

XFXB

(xL, yL )

(xu, yu )Adaptive search range:

L x R

U y B

x N d x N

y N d y N

N

N

N

N(xR yR )

(xB, yB )

Page 20: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Side Information Creation

CMLab, CSIE, NTU20

XFXB

Half pixel motion estimation

Page 21: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Side Information Creation

CMLab, CSIE, NTU21

XFXB

9

1,

arg min , for 1 i 9i

wvmf j i jx j j i

x w x x

Weighted vector median filter:

( , )

( , )i

jj

MSE x Bw

MSE x B

x1

x2

x3

x4

x5x6

x7

x8

x9

Spatial motion smoothing

Page 22: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

MSE2

Side Information Creation

CMLab, CSIE, NTU22

XFXB

9

1,

arg min , for 1 i 9i

wvmf j i jx j j i

x w x x

Weighted vector median filter:

( , )

( , )i

jj

MSE x Bw

MSE x B

x1

x2

MSE1

Page 23: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Side Information Creation

CMLab, CSIE, NTU23

XFXB

9

1,

arg min , for 1 i 9i

wvmf j i jx j j i

x w x x

Weighted vector median filter:

( , )

( , )i

jj

MSE x Bw

MSE x B

x1

9

1 11, 1

1 1 11 2 1 3 1 9

2 3 9

=

...

i j jj j

x x w x x

MSE MSE MSEx x x x x x

MSE MSE MSE

Page 24: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

The result of x6 is minimumxwvmf = x6 (Final motion vector ! )

Side Information Creation

CMLab, CSIE, NTU24

XFXB

9

1,

arg min , for 1 i 9i

wvmf j i jx j j i

x w x x

Weighted vector median filter:

( , )

( , )i

jj

MSE x Bw

MSE x B

x6

Page 25: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Side Information Creation

CMLab, CSIE, NTU25

XFXB

Block interpolation ( 0.75*XB + 0.25*XF )Bidirectional motion compensation

Page 26: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISCOVER Video Codec

CMLab, CSIE, NTU26

Ref. X. Artigas et al., PCS, 2007

白育姍

Laplacian Distributio

n

Page 27: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

CNM Parameter Estimation

CMLab, CSIE, NTU27

XFXB

Residual frame generation:R( , ) ( , )

( , )2

F xf yf B xb ybX x d y d X x d y dR x y

Page 28: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

CNM Parameter Estimation

CMLab, CSIE, NTU28

( , ) [ ( , )]n nT u v DCT R x y

Residual frame DCT transform : (4x4)

RT z

258

10

-30 120

0.5

-6

35

5

-24 200

-40

20

21

Variance of

ˆBand 1 : 22

Variance of

ˆBand 2 : 23

Variance of

ˆBand 3 :

Page 29: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

CNM Parameter Estimation

CMLab, CSIE, NTU29

T

258

10

-30 120

0.5

-6

35

5

-24 200

-40

20

2 22

2 22

2ˆ, [ ( , )]

ˆˆ ( , )

2ˆ, [ ( , )] >

[ ( , )]

n bb

n

n bn

D u v

u v

D u vD u v

CNM parameter computation:

21 1 1

Assume variance and mean

of band 1 is:

ˆ , [ 1|10 00 ]0 | 5E T

2 2

2

2

(| 258 | )

1 1000

150

08

2

108

nD

2 2

2

(|120 | )

3 1000

15

1000

0

2

0nD

( , ) | ( , ) | [| | ]n n b bD u v T u v E T

Page 30: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISCOVER Video Codec

CMLab, CSIE, NTU30

Ref. X. Artigas et al., PCS, 2007

白育姍

Page 31: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Correlation Noise Distribution Modeling

CMLab, CSIE, NTU

CNM parameter

Side information

Laplacian distribution

WZ

Page 32: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISCOVER Video Codec

CMLab, CSIE, NTU32

Ref. X. Artigas et al., PCS, 2007

白育姍

Page 33: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Conditional Bit Prob Computation

: probabilities of the k-th bit is one given side information (Y) and previous k-1 decoded bits

CMLab, CSIE, NTU33

X-Y

Prob.

176/4

144/4

WZ

WZ WZ

WZ

Laplacian pdf

1( 1| , )k

kP B Y B

1( 1| , )k

kP B Y B

1( )kB

( 1)k

B

Need to sum up 256 probabilities0011000 (24) 0011111 (31)

Assume quantization step size is 32

(31-24+1) x 32 = 256

R.P. Westerlaken et al., “Analyzing symbol and bit plane-based LDPC in distributed video coding”, ICIP, 2007.

Page 34: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISCOVER Video Codec

CMLab, CSIE, NTU34

Ref. X. Artigas et al., PCS, 2007

白育姍

Page 35: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Reconstruction

CMLab, CSIE, NTU35

4

7

6 1

7

7

0

3

6 30

5

1

Channel decode(LDPCA)

Bit plane 1: 0 0 0 1

Bit plane 2: 0 0 0 1

Bit plane 3: 1 0 0 1

Bit plane 4: 0 0 0 1

Bit plane 5: 0 1 0 0

Zig zag order

Bit planes of DC band:

Page 36: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Reconstruction

CMLab, CSIE, NTU36

1

1

1 1

ˆ [ | [ , ), ]

1,

11 1

( ) ( ), [ , )

2 ( )

1,

1

opt i i

i i

i i

i i

x E x x z z y

z y ze

e ey y z z

e e

z y ze

1, , is quantization step sizei iy z z y 2

2 2

is the model parameter related to the variance of

the Laplacian distribution as 1

/

D. Kubasov et al., “Optimal reconstruction in Wyner–Ziv video coding with multiple side information”, IEEE workshop on MMSP, 2007

Page 37: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISCOVER Video Codec

CMLab, CSIE, NTU37

Ref. X. Artigas et al., PCS, 2007

Poor RD performance for high motion and large GOP size sequences

白育姍

Page 38: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISCOVER Video Codec

CMLab, CSIE, NTU38

Ref. X. Artigas et al., PCS, 2007

Rooms for Improvement

白育姍

Page 39: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

MLWZ Video Codec

CMLab, CSIE, NTU39

Ref. R. Martin et al., IET Image Processing, 2010

SI (Y)

WZ (R)

Search rangeSMF1=0.1

SMF2=0.02

SMF81=0.1

( , )( , ) {( , )}bn x ySSE m m

n x y x ySMF m m P m m e

Update SMF:Normalize SMF:

( , )( , )

( , )x y

n x yn x y S S

n x ym S m S

SMF m mSMF m m

SMF m m

白育姍

Page 40: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

白育姍

MLWZ Video Codec

CMLab, CSIE, NTU40

Ref. R. Martin et al., IET Image Processing, 2010

DCTY

SI

Search range

……

MLY

( , )( , ) ( , ) ( , )x y

x y

S SML DCT

nn n m m x ym S m S

Y u v Y u v SMF m m

Side information re-estimation:

Page 41: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

MLWZ Video Codec

CMLab, CSIE, NTU41

Ref. R. Martin et al., IET Image Processing, 2010

( , )ˆ ( , )| ( , ) ( , )|

( ( , ) ( , ))

ˆ ( , )( , )( )

2

DCT DCTn n n m mx y

x y

ML DCT DCTn n n

S Su v X u v Y u vn

n x ym S m S

p X u v Y u v

u vSMF m m e

Correlation Noise Distribution Modeling:

DCT coefficient of WZ

DCT coefficient SI

Laplacian distributionLaplacian parameter

Sum of Laplacian !

白育姍

Page 42: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

MLWZ Video Codec

CMLab, CSIE, NTU42

Ref. R. Martin et al., IET Image Processing, 2010

Improve RD performance in high motion and large GOP size sequences

Rooms for Improvement

白育姍

Page 43: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISPAC Video Codec

CMLab, CSIE, NTU43

邱柏叡

邱柏叡Half-pixel motion estimation:

( , )

1( , ) ( , ) ( , )x y R P x y

x y B

MAD d d X x y X x d y dN

2 2( , ) ( , ) (1 )x y x y x yCF d d MAD d d K d d

( , )

1( , ) ( , ) ( , )x y R F x y

x y B

MAD d d X x y X x d y dN

白育姍

Reduce decoding time and Improve RD performance

Improve subjective quality

Improve SI for motion learning

For low motion parts

For high motion parts

Improve initial SI and motion learning

Page 44: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

DISPAC Video Codec

CMLab, CSIE, NTU44

邱柏叡

邱柏叡白育姍

程瀚平

Page 45: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Outline

Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work

CMLab, CSIE, NTU45

Page 46: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

RD Performance of DISPAC Test sequences:

QCIF, 15Hz, all frames (150 for Soccer, Foreman, Coastguard and 164 for Hall Monitor)

GOP size: 2, 4, 8 Bitrate and PSNR: only luminance component

CMLab, CSIE, NTU46

Soccer Foreman Coastguard Hall MonitorHigh LowMotion

Page 47: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

RD Performance (GOP=2)

CMLab, CSIE, NTU47

Page 48: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

RD Performance (GOP=4)

48 CMLab, CSIE, NTU

Page 49: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

RD Performance (GOP=8)

CMLab, CSIE, NTU49

3.6 dB3.1 dB

0.9 dB 2.6 dB

3.1 dB1.6 dB

0.2 dB 2.6 dB

Page 50: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Outline

Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work

CMLab, CSIE, NTU50

Page 51: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Parallelizing DISPAC Decoder

CMLab, CSIE, NTU51

OpenMP

CUDA

白育姍

邱柏叡

邱柏叡

Page 52: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Side Information Re-Creation

Assume QCIF sequence, 800 4x4 WZ blocks, 1024 search candidates within search range

CMLab, CSIE, NTU

Second iteration(128 candidates)

First iteration(128 candidates)Texture memory

52

Page 53: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Side Information Re-Creation

Reduction algorithm

CMLab, CSIE, NTU53Mark Harris, “Optimizing parallel reduction in CUDA”, NVIDIA Developer Technology, 2007.

( , )

1( , ) ( , ) ( , )x y R B x y

x y B

MAD d d X x y X x d y dN

2 2( , ) ( , ) (1 )x y x y x yCF d d MAD d d K d d

( , )

1( , ) ( , ) ( , )x y R F x y

x y B

MAD d d X x y X x d y dN

Page 54: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Parallelizing DISPAC Decoder

CMLab, CSIE, NTU54

CUDA

CUDA

白育姍

邱柏叡

邱柏叡

Page 55: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Correlation Noise Distribution Modeling

Assume QCIF sequence, 800 4x4 WZ blocks, 1024 possible integer values of X-Y for DCT coefficient band 2

CMLab, CSIE, NTU55

176/4

144/4

WZ

WZWZ WZ

Skip Intra

WZ1024 integer values

X-Y

PCNM

Sum of Laplacian pdf

Page 56: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Correlation Noise Distribution Modeling

CMLab, CSIE, NTU56

Page 57: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Conditional Bit Prob Computation

: probabilities of the k-th bit is one given side information (Y) and previous k-1 decoded bits

CMLab, CSIE, NTU57

X-Y

PCNM

176/4

144/4

WZ

WZWZ WZ

Skip Intra

WZ

Sum of Laplacian pdf

1( 1| , )k

kP B Y B

1( 1| , )k

kP B Y B

1( )kB

( 1)k

B

Need to sum up 256 probabilities0011000 (24) 0011111 (31)

Assume quantization step size is 32

(31-24+1) x 32 = 256

R.P. Westerlaken et al., “Analyzing symbol and bit plane-based LDPC in distributed video coding”, ICIP, 2007.

Page 58: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Conditional Bit Prob Computation

CMLab, CSIE, NTU58

Page 59: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Outline

Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work

CMLab, CSIE, NTU59

Page 60: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Decoding speed of DISPAC A workstation equipped with an Intel Xeon

E5530 CPU at 2.4GHz and an NVIDIA Tesla C1060 graphics card is used to emulate the basic unit of a Could computing environment.

Operating system: Debian squeeze/sid with 2.6.32-5-amd64 kernel.

QCIF, 15Hz, whole sequence, GOP size 8, quantization table 8 (Q8)

CMLab, CSIE, NTU60

Page 61: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Decoding speed of DISPAC

CMLab, CSIE, NTU61

Bottleneck analysis (sequential decoding)

CNM: Correlation Noise Modeling

Page 62: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Decoding speed of DISPAC

CMLab, CSIE, NTU62

Foreman Soccer Coastguard Hall Monitor

22.81 16.64 27.77

9.21

232.06

179.95

293.17

184.08

120.27 115.51 126.88

104.39

8.75 8.43 9.06 8.02

Speedup ratio of decoding modules (8core+GPU)

LDPCA Decode CNM SI Re-Creation Others

Page 63: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Decoding speed of DISPAC

63

DISCOVER MLWZ DISPAC

84.7875.3

48.35

1.54

Foreman

Sequential 8core+GPU

DISCOVER MLWZ DISPAC

81.31 84.38

29.83

1.33

Soccer

Sequential 8core+GPU

DISCOVER MLWZ DISPAC

62.31

74.72 77.95

1.9

Coastguard

Sequential 8core+GPU

DISCOVER MLWZ DISPAC

13.78

33.18

15.93

1.19

Hall Monitor

Sequential 8core+GPU

Average decoding time per frame (sec.)

Page 64: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Decoding speed of DISPAC

64

DISCOVER MLWZ DISPAC

1.00 1.13 1.75

55.1238371511453

Foreman

Sequential 8core+GPU

DISCOVER MLWZ DISPAC

1.00 0.96 2.73

60.9697161975081

Soccer

Sequential 8core+GPU

DISCOVER MLWZ DISPAC

1.00 0.83 0.80

32.7941378891544

Coastguard

Sequential 8core+GPU

DISCOVER MLWZ DISPAC

1.00 0.42 0.87

11.5701702530149

Hall Monitor

Sequential 8core+GPU

Speed up ratio (compare to DISCOVER)

Page 65: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Outline

Introduction DISPAC video codec RD performance of DISPAC Parallelizing DISPAC decoder Decoding speed of DISPAC Conclusions and future work

CMLab, CSIE, NTU65

Page 66: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Conclusions

DISPAC combined the coding tools developed in recent literatures (e.g. MLWZ codec) with some newly developed modules (block mode selection, SI re-creation and adaptive deblocking filter). Up to 3.6 dB gain on RD performance

The decoding modules can be highly parallelized. Up to 61 times faster than state-of-the-art DVC codec

CMLab, CSIE, NTU66

Page 67: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Future Work

Update the correlation noise model parameter during decoding process. For RD performance

Improve parallelizability of the parallel LDPCA decoding algorithm for small size parity check matrices. For decoding speed

WZ to H.264 video transcoder. For real demo system

CMLab, CSIE, NTU67

Page 68: High Efficient Distributed Video Coding with Parallelized Design for Cloud Computing

Thank You

CMLab, CSIE, NTU68


Recommended