View
218
Download
1
Category
Preview:
Citation preview
Robust Low-Latency Voice and Video Communication over
Best-Effort Networks
Department of Electrical EngineeringStanford University
March 12, 2003
http://www.stanford.edu/~yiliang/
Yi Liang
Liang: Robust Low-Latency Voice and Video Communication 2
Media Delivery over IP Networks
Internet
Liang: Robust Low-Latency Voice and Video Communication 3
QoS Concerns and Challenges
Communication over best-effort networks …
Delay Impairs interactivity of conversational services Voice over IP: recommended one
way delay < 150 ms [ITU-T G.114]
Packet loss Impairs perceptual quality
Delay jitter Obstructs sequential and continuous media output
Liang: Robust Low-Latency Voice and Video Communication 4
Outline of Contributions
ServerClient
Packet network
I. Client sideII. Transport
III. Network-adaptive coding
I. Client side
Adaptive playout scheduling for VoIP that reduces latency and packet loss
II. Transport
Packet path diversity and applications in low-latency communications
III. Network-adaptive coding
Low-latency video communication that does not require packet retransmission
Liang: Robust Low-Latency Voice and Video Communication 5
Outline
I. Client side
Adaptive playout scheduling for VoIP that reduces latency and packet loss
II. Transport
Packet path diversity and applications in low-latency communications
III. Network-adaptive coding
Low-latency video communication that does not require packet retransmission
Liang: Robust Low-Latency Voice and Video Communication 6
Delay Jitter and Buffering
Avg. buffering delay (ms)
Lat
e lo
ss r
ate
(%)
Late lossBuffering delay
Fixed Playout Schedule
Liang: Robust Low-Latency Voice and Video Communication 7
Adaptive Playout Scheduling (1)
Buffer. delay
Adaptive Playout Schedule
Fixed schedule
Liang: Robust Low-Latency Voice and Video Communication 8
Adaptive Playout Scheduling (2)
Requires media scaling
Slow down Speed up
Sender
Receiver
Playout
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8 time
Packetization time
1. How to set the playout schedule?2. How to scale the media?3. Quality of scaled voice?
Liang: Robust Low-Latency Voice and Video Communication 9
Determine the Playout Schedule
Delay (ms)
Pro
bab
ilit
y
Delay histogram Next packet:
Given the acceptable loss rate, find the playout deadline
History-based estimation using past w delays
Deadline
Loss prob.
Liang: Robust Low-Latency Voice and Video Communication 10
Voice Scaling Using Time-Scale Modification
Based on WSOLA [Verhelst ‘93]
Preserves pitch
Improved to scale short individual voice packets; no delay
Output
1/20/1 2/3 3 4
Input
Pitchperiod
0 21 3 4
Template segment
Similar segment
Packet expansion
Liang: Robust Low-Latency Voice and Video Communication 11
Examples of Time-Scale Modification
Speech scaling
Audio scaling
Original 130% 70%
Original 130% 70%
Liang: Robust Low-Latency Voice and Video Communication 12
Quality of Time-Scale Modified Voice
Packets scaled: 18.4 %Scaling ratio: 50 - 200%
DMOS: 4.5 out of 5[ITU-T P.800]
Adaptive Playout Schedule
ORIGINALMODIFIED
Liang: Robust Low-Latency Voice and Video Communication 13
Results and Comparison
Algorithms:
1. Fixed playout schedule
2. Only adjust playout schedule during silence periods
[Ramjee ’94; Moon ‘98]
3. Adaptive playout scheduling
1
2
3
Liang: Robust Low-Latency Voice and Video Communication 14
Overall Performance
Alg. Loss rate
MOS
Alg. 2 10% 2.6
Alg. 3 4% 3.7
Stanford Chicago
MOS scale : 1 - 5 [ITU-T P.800]
-50%
Liang: Robust Low-Latency Voice and Video Communication 15
Subjective Listening Test Results
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
5
1 2 3 4
Alg. 2
Alg. 3
Trace
MOSStanford
1. Chicago2. German
y3. MIT4. China
Alg. 2
Alg. 3
Liang: Robust Low-Latency Voice and Video Communication 16
Summary
Adaptive Playout Scheduling
Improves the tradeoff between buffering delay and packet loss
Time-scale modification-based speech processing does not impair speech quality
Overall speech quality improves by 1 on a 5-point MOS scale The passive algorithm can be easily implemented on client
Audacity T2, 8X8, Inc.
Liang: Robust Low-Latency Voice and Video Communication 17
Outline
I. Client side
Adaptive playout scheduling for VoIP that reduces latency and packet loss
II. Transport
Packet path diversity and applications in low-latency communications
III. Network-adaptive coding
Low-latency video communication that does not require packet retransmission
Liang: Robust Low-Latency Voice and Video Communication 18
Packet Path Diversity
Motivation Typically better alternative
path exists [Savage, SigComm ‘99]
Uncorrelated packet loss on independent paths [Apostolopoulos ‘01]
Low-latency requirement
Sender
Receiver
1 2Relay server Relay
server
Liang: Robust Low-Latency Voice and Video Communication 19
Internet Experiments
QwestQwest
Exodus
Comm.
Exodus
Comm.
BBN PlanetBBN Planet
Santa Clara, CA192.84.16.176 MIT
18.184.0.50
Harvard140.247.62.110
(5ms)(45ms)
(40ms) (5ms)
Sender
Relay Server
Receiver
(delay incurred on a link or ISP network)
Liang: Robust Low-Latency Voice and Video Communication 20
Measured Packet Delay Trace
Liang: Robust Low-Latency Voice and Video Communication 21
Adaptive Playout Scheduling for Two-Stream
Liang: Robust Low-Latency Voice and Video Communication 22
Multiple Description Speech Coding
Complementary and redundant descriptions of mediaStream 1:
Even samples: finer quantizationOdd samples: coarser quantization
Stream 2: Vice versa [Jiang, Ortega ‘00]
Es1
s2
O E O E O
O E O E O E
E O
Packet length
Time
Liang: Robust Low-Latency Voice and Video Communication 23
Determine the Playout Schedule
To minimize the Lagrangian cost function
))1()1((
}Pr{
}Pr{
12212211
2
1
ppppppd
d
ddC
|lost ndescriptio oneonly
|lost nsdescriptio both
Stream 1
p1
p2
Delay
dProb.
Stream 2
Liang: Robust Low-Latency Voice and Video Communication 24
Overall Performance
Avg end-to-end delay (ms)
Loss rate(%)
-35ms
Liang: Robust Low-Latency Voice and Video Communication 25
Summary
Packet Path Diversity
Exploitation of statistically uncorrelated delay jitter and packet loss behavior
Adaptive playout scheduling for multiple streams provides lower latency and reduced distortion
Liang: Robust Low-Latency Voice and Video Communication 26
Outline
I. Client side
Adaptive playout scheduling for VoIP that reduces latency and packet loss
II. Transport
Packet path diversity and applications in low-latency communications
III. Network-adaptive coding
Low-latency video communication that does not require packet retransmission
Liang: Robust Low-Latency Voice and Video Communication 27
Low-Latency Video Communication
Motivation for low-latency video Real-time conversational services Interactive video streaming
Voice vs. Video
Voice over IP Typical video streaming
< 150 ms 5 ~ 15 seconds pre-roll time
Weak or no dependency across packets
Strong dependency across packets due to motion-compensated coding
Liang: Robust Low-Latency Voice and Video Communication 28
Low-Latency Video — Challenges
What the problems are Packet dependency due to hybrid motion-compensated
coding
Large receiver buffer and packet retransmission employed
I P P P P P P P
Interframe prediction
TimeTransmission error
The “P-I” scheme
Liang: Robust Low-Latency Voice and Video Communication 29
Approaches
Goal Achieve VoIP-like latency
Approach Eliminate the need for retransmission Robust network-adaptive coding by optimal packet
dependency management
Liang: Robust Low-Latency Voice and Video Communication 30
Coding Mode
P1
P2
P5
INTRA
…
…
Coding mode
Increased error-resilience
Liang: Robust Low-Latency Voice and Video Communication 31
Error-Resilience vs. Compression Efficiency
Foreman sequence coded at PSNR=35.9
dB(H.26L TML8.5,
30 fps, 270 frames)
INTRA
Coding mode
Increased error-resilience
Decreased compression efficiency
Rat
e (K
bp
s)
Liang: Robust Low-Latency Voice and Video Communication 32
Determine R-D Optimized Coding Modes
Select the prediction mode that minimizes the R-D cost
…
Long-Term Memory V
P2 : (R2, D2) …PV: (RV, DV)
I : (R , D )
…
)()( ,,...2,1 nJnv vVvopt min arg
P1 : (R1, D1) vvv RDJ v: coding mode
Liang: Robust Low-Latency Voice and Video Communication 33
Estimation of Distortion
1-p
1-p
p1-p
p
p
),( 11 DR
),( 22 DR
),( VV DR
…
1-p
p
D11, p11=(1-p)3
D12, p12=(1-p)2p
D18, p18=p3
…
…
8
1111
iii pDD
P1
4
1222
iii pDD
D21, p21=(1-p)21-p
p D22, p22=(1-p)p
D23, p23=p(1-p)
D24, p24=p2
P2n-3 n-2 n-1 n
Channel feedback utilized at the source coder
Liang: Robust Low-Latency Voice and Video Communication 34
Experimental Results
Comparing
1. Rate-distortion optimized dependency management
2. Simple P-I
1
2
I P P P P P …
Liang: Robust Low-Latency Voice and Video Communication 35
R-D Performance (1)
No retransmission; no algorithm delay channel loss rate=10%
1.2dB
36%
Liang: Robust Low-Latency Voice and Video Communication 36
R-D Performance (2)
No retransmission; no algorithm delay channel loss rate=10%
Liang: Robust Low-Latency Voice and Video Communication 37
R-D Performance (3)
Bitrate 200 Kbps, various channel loss rates
Liang: Robust Low-Latency Voice and Video Communication 38
Video Demo (1)
R-D optimized Simple P-I
Foreman, 109Kbps, 10% channel lossNo retransmission; no algorithm delay
Liang: Robust Low-Latency Voice and Video Communication 39
Video Demo (2)
R-D optimized Simple P-I
Mother-Daughter, 318Kbps, 10% channel lossNo retransmission; no algorithm delay
Liang: Robust Low-Latency Voice and Video Communication 40
Summary
Network-Adaptive Packet Dependency Management
R-D optimization improves the tradeoff between error-resilience and compression efficiency
Eliminated the need for packet retransmission; achieved VoIP-like low latency
Liang: Robust Low-Latency Voice and Video Communication 41
Summary of Contributions
ServerClient
Packet network
I. Client sideII. Transport
III. Network-adaptive coding
I. Client side
Adaptive playout scheduling that reduces latency and packet loss
II. Transport
Packet path diversity that further reduces communication delay and distortion
III. Network-adaptive coding
A video communication system that requires no packet retransmission, which allows VoIP-like low-latency
Liang: Robust Low-Latency Voice and Video Communication 42
Other Contributions
Other contributions not covered in this presentation
A low-latency loss concealment scheme
Packet path diversity for robust low-latency video communication
A layered coding structure to avoid mismatch error for streaming of pre-coded video
An accurate model to quantify video distortion as a result of packet losses
A prescient scheme that optimizes the dependency for a group of packets for video streaming
Liang: Robust Low-Latency Voice and Video Communication 43
Publications
Journal publications: 3IEEE Transactions on Multimedia
Journal of Wireless Communication and Mobile Computing
IEEE Transactions on Circuits and Systems for Video Technology
Invited papers: 4
Papers in conference proceedings: 8Proceedings ACM Multimedia (SigMM)
… …
Liang: Robust Low-Latency Voice and Video Communication 44
Media Delivery over IP Networks
Internet
Liang: Robust Low-Latency Voice and Video Communication 45
Low-Latency Media Communication
Liang: Robust Low-Latency Voice and Video Communication 46
Acknowledgements
Committee members, EE faculty
My family members
Our sponsors
IVMS group members and alumni,
and assistants
Many friends, in ISL, EE, and Stanford
Liang: Robust Low-Latency Voice and Video Communication 47
Backup Slides
The following backup slides may or may not be used …
Liang: Robust Low-Latency Voice and Video Communication 48
Determine the Playout Schedule
Delay (ms)
Per
cen
tag
e
Delay histogram
d
l̂
Liang: Robust Low-Latency Voice and Video Communication 49
Likelihood Ratio Factor
w
i s
siD
wlrf
12
2)(1
[Gibbon, Little, ‘96]
Liang: Robust Low-Latency Voice and Video Communication 50
More Samples for Time-Scale Modification
Audio scaling
Original Expanded by 20% Compressed by 20%
Liang: Robust Low-Latency Voice and Video Communication 51
Low-Latency Loss Concealment
Earlier work [Stenger ‘96] Algorithm delay reduced to one packet time Nicely integrates into adaptive playout system 20% random packet loss:
Original: Loss: Concealed:
i-2 i-1 i+1 i+2
i-2 i+2
time
i lost
i-1 i+1
L L
2L1.3L
Alignment found by correlation
Liang: Robust Low-Latency Voice and Video Communication 52
Speech Samples
Alg. Loss rate
MOS
Alg. 2 10% 2.6
Alg. 3 4% 3.7
Original 4.4
Liang: Robust Low-Latency Voice and Video Communication 53
Overall Performance
Stanford ->
1. Chicago2. German
y3. MIT4. China
1 2
3 4
Liang: Robust Low-Latency Voice and Video Communication 54
Multi-Stream Playout Scheduling
Time1 2 3 4 5 6
Sending on path 1
Receiving on path 1
Playout1 2 3 4 65
1 2 3 4 5 6Sending on path 2
Receiving on path 2
Packet path diversity reduces effective delay jitter and therefore late loss rate
Liang: Robust Low-Latency Voice and Video Communication 55
Path Diversity – Voice Demo
Original
Average total end-to-end delay: 84 ms Error concealment: speech segment repetition
Average total end-to-end delay: 84 ms Error concealment: speech segment repetition
Path Diversity Single-stream with FECat same data rate
Liang: Robust Low-Latency Voice and Video Communication 56
More Experiment Results
Results obtained by varying 2 while keeping 1 fixed
With higher delay: better chances to play both descriptions
Observed lower playout rate variation by using multiple streams
Jitter averaged; lower STD of min(di , dj)
Liang: Robust Low-Latency Voice and Video Communication 57
PESQ Results
Perceptual Evaluation of Speech Quality (ITU-T Rec. P.862, Feb. 2001)
PESQ can be used for end-to-end quality assessment
Ranges from –0.5 to 4.5 but usually produces MOS-like scores between 1.0 and 4.5
Liang: Robust Low-Latency Voice and Video Communication 58
Internet Experiment (2)
VBNS IP Backbone
Service
VBNS IP Backbone
ServiceDANTE
Operations
DANTE Operations
UUNET
Tech.
UUNET
Tech.
Erlangen131.188.130.136
Harvard140.247.62.110(7ms)
(40ms)
AT&TAT&T
(5ms)(5ms)
(10ms)
New Jersey165.230.227.81
Path 1 (direct): N. J. – Erlangen Path 2 (alternative): N. J. – Harvard – Erlangen
Liang: Robust Low-Latency Voice and Video Communication 59
Results (2)
Path 1 (direct): N. J. – GermanyPath 2 (alternative): N. J. – Harvard –
Germany
Mean delay
61.3/65.0 ms link loss
0.6% / 1.1% Significant reduction of
late loss and end-to-end delay by packet path diversity
Liang: Robust Low-Latency Voice and Video Communication 60
Video Streaming Using Path Diversity
Path 1
Path 2n-5 n-4 n-3 n-2 n-1 n
Next frame to encode and send: nGoal Minimize distortion under rate constraint
(1) Path selection to minimize the loss probability of frame n and maximize the benefit of path diversity Alternate when both channels are good Send small probe packets over the channel in bad state
[Setton, Liang, Girod, ICME’03, submitted]
(2) Source coding
Liang: Robust Low-Latency Voice and Video Communication 61
Determine Prediction Mode
vvopt
vvv
Jv
In
n-vv
vRDJ
min arg
as channel same
theover sent frame
for ,
}{}1{}
|{
Long-Term Memory V=5
n-5 n-4 n-3 n-2 n-1 n
Prediction modes: v=1, 2, … V, I
},5,3,2,1{ I
V=1
,1J
V=2
,2J
V=3
,3J
V=5
,5J IJ
Path 1
Path 2
Liang: Robust Low-Latency Voice and Video Communication 62
Results (1)
Channel loss rate_1=loss rate 2 =15%
Avg burst len=8Feedback delay=6
Comparing to RPS-NACK
[Lin, ICME’01] Video
redundancy coding (VRC) [H.263++]
Liang: Robust Low-Latency Voice and Video Communication 63
Results (2)
Channel loss rate_1=loss rate 2 =15%
Avg burst len=8Feedback delay=6
Liang: Robust Low-Latency Voice and Video Communication 64
Path Diversity Gain with Shared Link
Liang: Robust Low-Latency Voice and Video Communication 65
TCP-Friendly Streaming
pRTT
MTUr
22.1
[Mahdavi, Floyd, ‘97][Floyd, Handley, Padhye, Widmer, ‘00]
Liang: Robust Low-Latency Voice and Video Communication 66
Long-Term Memory Prediction and Packet Dependency
To manage prediction dependency
Long-term Memory (LTM) prediction on macroblock level[Wiegand, Zhang, Färber, Girod, ’99, ‘00]
Reference Picture Selection (RPS)
[Annex N H.263+, Annex U H.263++, H.26L]
NEWPRED [ISO/IEC MPEG-4]
NACK
Liang: Robust Low-Latency Voice and Video Communication 67
R-D Optimization
Q
Qe
DpD
RDJ
Q
nL
lvlvlv
vvv
34
55 1.0
)(
1
[H.26L TML 8.5] [Wiegand, Girod, ICIP’01]
Liang: Robust Low-Latency Voice and Video Communication 68
Dynamic PSNRs
Liang: Robust Low-Latency Voice and Video Communication 69
Streaming of Pre-Encoded Media
Media pre-coded and pre-stored offline Bit-stream assembly at streaming times Pre-coded content benefits large number of users
One potential problem …
Liang: Robust Low-Latency Voice and Video Communication 70
Potential Mismatch Error
Transmitted P P P I P P …
I I I I I I …S1
S2
EncodedP P P P P P …
Previous schemes using S-frame [Färber, ICIP’97 ], SP-frame [H. 26L] alleviate or solve the problem at the cost of higher bitrate
Decoded P P P I P P …
Mismatch
Liang: Robust Low-Latency Voice and Video Communication 71
Layered Coding Structure for Bitstream Assembly
P5 P5 P5 P5 LAYER III P5 P5 P5 P5
I P5 P5 P5 P5I P5 P5 P5 …
I P5 P5 …V=5
SYNC-frames: allow switching
LAYER I
I I
TGOP=25
P5 P5LAYER III
Liang: Robust Low-Latency Voice and Video Communication 72
P-I for Comparison
I P P P P P P P P P I P …I P P P P P P P P P I …
I P P P P P P P P P I …I P P P P P P P P P I …
I P P P P P P P P P I …
Liang: Robust Low-Latency Voice and Video Communication 73
R-D Performance (1)
No retransmission; no algorithm delay channel loss rate=10%
1.2dB
36%
Liang: Robust Low-Latency Voice and Video Communication 74
R-D Performance (2)
No retransmission; no algorithm delay channel loss rate=10%
Liang: Robust Low-Latency Voice and Video Communication 75
R-D Performance (3)
Bitrate 200 Kbps, various channel loss rates
Liang: Robust Low-Latency Voice and Video Communication 76
Cost of Error-Resilience (1)
Error-resilience / low-latency is not free
PSNR (dB)
Bitrate increase for
5% loss
Bitrate increase for 10%
loss
33.4 17% 39%
35.9 20% 43%
37.8 14% 35%
Distortion at the encoder.7,5 fbdV
Liang: Robust Low-Latency Voice and Video Communication 77
Cost of Error-Resilience (2)
PSNR (dB)
Bitrate increase
for 5% loss
Bitrate increase for 10%
loss
35.0 20% 52%
36.4 17% 45%
39.3 22% 46%
40.0 16% 40%
Distortion at the encoder.7,5 fbdV
Liang: Robust Low-Latency Voice and Video Communication 78
Cost of Layered Coding Structure (1)
23%
25%
.0,5,5 pdV fb
30%
Lossless channel
32%
Liang: Robust Low-Latency Voice and Video Communication 79
Cost of Layered Coding Structure (2)
Channel loss rate=5%
.05.0,5,5 pdV fb
Liang: Robust Low-Latency Voice and Video Communication 80
Comparing Different Error-Resilience Schemes
Latency R-D cost Resilience to burst loss
ARQ High Low Low
FEC Medium-low Medium-high
Medium-low, depending on delay
Dependency control
Very low Medium-high
High
Recommended