Objective and Subjective Degradations of Transcoded Voice for Heterogeneous Radio Networks Interoperability Ľubica Blašková 1, Jan Holub 1, Michael Street

Objective and Subjective Degradations of Transcoded

Voice for Heterogeneous Radio Networks Interoperability

Ľubica Blašková1, Jan Holub1, Michael Street2, Filip Szczucki2 and Ondřej Tomíška1

1FEE CTU, Prague2NATO C3 Agency, The Hague

2

Presentation Outlines

Voice Transcoding – Issue of Modern Voice Transcoding – Issue of Modern

Heterogenous Networks Heterogenous Networks

Speech Transmission Quality MeasurementsSpeech Transmission Quality Measurements

Experiments PerformedExperiments Performed

ResultsResults

ConclusionsConclusions

3

Voice Transcoding I

EEffective voice communications will remains a keyffective voice communications will remains a key service for those operating in a tactical environmentservice for those operating in a tactical environment

Multinational operations routinely require different Multinational operations routinely require different tactical communication systems from different tactical communication systems from different nations to connect togethernations to connect together or f or for wired networks to or wired networks to connect to wireless sub-systems connect to wireless sub-systems

DDifferent networks apply differing voice encoding ifferent networks apply differing voice encoding methods to the voice signal methods to the voice signal

It is recognised that use of multiple voice coders in It is recognised that use of multiple voice coders in series degrades the quality and intelligibility of the series degrades the quality and intelligibility of the resulting voiceresulting voice

4

Voice Transcoding II

For complex telecommunication chains as appear For complex telecommunication chains as appear today, multiple voice coding in one communication today, multiple voice coding in one communication direction occurs, examples:direction occurs, examples: GSM-to-GSM call: GSM (FR/EFR/HR) – G.711 – GSM-to-GSM call: GSM (FR/EFR/HR) – G.711 –

GSM (FR/EFR/HR)GSM (FR/EFR/HR) GSM-to-DECT call: GSM-G.711-DECTGSM-to-DECT call: GSM-G.711-DECT UMTS-to-PSTN: AMR-(G.729)-G.711UMTS-to-PSTN: AMR-(G.729)-G.711 Skype-to-GSM (typ.): Skype-G.729-GSMSkype-to-GSM (typ.): Skype-G.729-GSM etc.etc.

5

Voice Transcoding III

For both ad-hoc and permanently interoperating For both ad-hoc and permanently interoperating (special) networks, even more coder types must be (special) networks, even more coder types must be taken into account:taken into account:

TETRA (ACELP)TETRA (ACELP)

STANAG 4STANAG 4591591 (MELPe) (MELPe)

Problem: the lower bit-rate, the higher risk the coder Problem: the lower bit-rate, the higher risk the coder will not interoperate satisfactorilywill not interoperate satisfactorily

6

Speech Transmission Quality

Perceived connection quality is influenced by many transmission Perceived connection quality is influenced by many transmission impairments (delay, echo, various kinds of noise, speech (de)coding impairments (delay, echo, various kinds of noise, speech (de)coding distortions and artifacts, temporal and amplitude clipping, ...), distortions and artifacts, temporal and amplitude clipping, ...), assessed and measured in MOS (Mean Opinion Score)assessed and measured in MOS (Mean Opinion Score)

5 Excellent5 Excellent 4 Good4 Good 3 Fair3 Fair 2 Poor2 Poor 1 Bad1 Bad

Listening / Conversational Tests (ITU-T P.800)Listening / Conversational Tests (ITU-T P.800) Intrusive Algorithmic Methods (P.862 PESQ)Intrusive Algorithmic Methods (P.862 PESQ)

dedicated test call must be establisheddedicated test call must be established

Non-intrusive Algorithmic Methods (P.563 Non-intrusive Algorithmic Methods (P.563 3SQM3SQM)) real voice samples acquired on real connections are processedreal voice samples acquired on real connections are processed

Estimation Algorithmic Methods - Estimation Algorithmic Methods - jitter, delay, packet loss etc. jitter, delay, packet loss etc. mapped mapped to MOSto MOS (P.564, E-model) (P.564, E-model)

7

Experiments Performed Speech database recordingSpeech database recording

No background noise, Hoth +10dB SNRNo background noise, Hoth +10dB SNR

2 male, 2 female speakers2 male, 2 female speakers

ACELP, MELPe, G.729, GSM FR coding ACELP, MELPe, G.729, GSM FR coding (different typical combinations)(different typical combinations)

5 recordings per combination per speaker5 recordings per combination per speaker

8

Subjective Testing

ITU-T P.800 ITU-T P.800 methodologymethodology

38 untrained 38 untrained listenerslisteners

listeninglistening chamber chamber <190<190 ms ms,, <1<10 dB 0 dB SPL (A)SPL (A)

Results shown for Results shown for „no noise“ „no noise“ conditioncondition

Technology MOS-LQSn CI95%

ACELP 4,25 0,166

MELPe 2,21 0,184

GSM 3,64 0,198

G.729 4,00 0,188

ACELP-MELPe 1,30 0,121

MELPe-ACELP 2,89 0,199

ACELP- GSM 3,68 0,199

GSM-ACELP 3,85 0,191

ACELP- G.729 4,22 0,204

G.729-ACELP 3,63 0,203

MELPe-G.729 3,44 0,200

G.729-MELPe 2,46 0,168

MELPe-GSM 3,08 0,197

GSM-MELPe 2,33 0,189

MELPe–G.729-ACELP 3,13 0,203

ACELP-G.729-MELPe 1,75 0,182

9

Objective Testing (PESQ-LQ, 3SQM)

PESQ-LQ after 2-nd order regression

1

2

3

4

5

1 2 3 4 5

MOS-LQsn

MO

S-L

Qo

n (

PE

SQ

-LQ

, reg

r.)

3SQM (no regression)

1

2

3

4

5

1 2 3 4 5

MOS-LQsn

MO

S-L

Qo

n (

3SQ

M, n

o r

egr.

)

10

Objective Testing II (PESQ-LQ, left: male voices, right: female voices)

Male Voices

1

2

3

4

5

1 2 3 4 5

MOS - LQSn

PE

SQ

-LQ

(re

gre

ssed

)

Female Voices

1

2

3

4

5

1 2 3 4 5

MOS - LQSn

PE

SQ

-LQ

(re

gre

ssed

)

11

Results

PESQ: P.862 + P.862.1,

regressed) 3SQM: P.563

Correlation 0,836 0,370

Maximum pos. difference 1,060 3,288

Maximum neg. difference -1,550 -2,722

RMSE 0,560 1,072

12

Conclusions:• All tandem setups perform with decreased speech

transmission quality

• Always both directions (coders “A-to-B” and “B-to-A”) must be tested as the results can differ significantly

• Neither PESQ-LQ neither 3SQM can be used reliably for objective voice QoS monitoring in case of multiple coder tandeming where at least one low bit-rate coder is used. However, PESQ-LQ after proper regression shows at least reasonable correlation with subjective data (0.84)

• In our experiment, both male and female transmitted voices were subjectively evaluated almost equally

• Objective methods underestimate MOS scores for female

voice transmissions

Thank you for your attention !

http://measure.feld.cvut.czhttp://measure.feld.cvut.cz

www.mesaqin.comwww.mesaqin.com

Documents

Objective and Subjective Degradations of Transcoded Voice for Heterogeneous Radio Networks Interoperability Ľubica Blašková 1, Jan Holub 1, Michael Street