13
Singing Voice Resynthesis using Concatenation-based Techniques Nuno Miguel da Costa Santos Fonseca

Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Singing Voice Resynthesisusing Concatenation-based TechniquesNuno Miguel da Costa Santos Fonseca

Page 2: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

ques

Goal

• Develop, create or adapt techniques for singing resynthesis- User’s voice to control singing synthesis

- Automatically recreate the same music and lyrics performance

• Using Sound and Music Computing (SMC) approaches:- Merging speech and music research

- Sampling-based synthesis

2

Page 3: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

ques

Applications• Audio Effect

- Using an FX unit as a synthesizer, choosing the output sound

• New UI approach for Singing Synthesis- Using the user’s voice to control a singing

synthesizer

• Several contexts: - Restoration

- Transynthesis

- Instrument Enhancer

3

Page 4: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

ques

Singing Resynthesis

• Need to refocus on main goal (singing voice)

• Replication of three domains:- Phonetics

- Pitch

• Melodic Line

• Pitch related effects (e.g. portamento, vibrato)

- Dynamics (sound intensity)

• e.g. Crescendo

4

Page 5: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

quesSynthesis

• Sample concatenation- Phase alignment

• Offset tests with correlation

• Minor pitch adjustments- Interpolation

• Simple

• No significant artifacts for small pitch changes

- Pitch smoothing

• Better continuity during overlapping

• Frame energy for dynamics

5

Page 6: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

ques

Pitch and Dynamics Extraction

• Pitch extraction based on YIN method- YIN method

- Median smoothing

- Aperiodicity evaluation and decision

• Dynamic information based on energy- Simple, but efficient

6

Page 7: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

ques

Replicating Phonemes

• Several possibilities tested:- Phoneme extraction

• NN, SVM, HMM

- Phonetic typewriter

• NN SOM

- Phonetic similarity

• Euclidean distance

• Within “Singing Resynthesis”, phonetic similarity presented the best results

7

Page 8: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

ques

Measuring Phonetic Similarity

• Target and Concatenation Cost

• Target Cost: - Euclidean Distance with normalized differences within four

domains:

• MFCC

• LPC Frequency Response

• LPC Itaruka-Saito Distance (symmetrical)

• Aperiodicity (YIN)

• Concatenation Cost- LPC Frequency Response

8

Page 9: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

ques

Unit Selection

• Searching for the sequence with the lowest total cost- Simple for target cost

- Complex for both target and concatenation cost

• Several tests, including:- Heuristics

- Heuristics with Segmentation

- Viterbi

- Viterbi with pruning

• Final Solution: Viterbi with 10% pruning

9

Page 10: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

ques

Proposed Method

10

Page 11: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

ques

Improving Sound Quality• Reducing number of transitions due to pitch

- Transitions forced by pitch variations

- Pitch tolerance: 0.5 → 1.5 semitone

• Reducing number of transitions due to phonetics- 50% more weight on concatenation cost

- adjacent concatenation cost: 0 → - 0.5

• Discarding internal low energy frames

• Prevent frame repetitions

• Considering the effects of time shifts- Interpolation/phase alignment will create time

shifts → incorrect concatenation costs

- Interpolate worst concatenation scenario

11

Page 12: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

ques

Some examples

• Amazing Grace (LeAnn Rimes)

• Frozen (Madonna)

• Tom’s Diner (Susanne Vega)

• Whenever (Shakira)

12

Page 13: Singing Voice Resynthesis using Concatenation-based Techniquespaginas.fe.up.pt/~voicestudies/.../meetings/...v2.pdf · Singing Resynthesis • Need to refocus on main goal (singing

Sing

ing V

oice

Res

ynth

esis

usin

g C

onca

tena

tion-

base

d Te

chni

ques

Conclusions

• Although the concept of Resynthesis is simple- Its implementation is very complex

- High potential for future applications

• Much more research work is required- Artifacts still prevent its use on professional applications

• Main obstacles:- Inexistence of a singing dataset for research purposes

- Lack of a numeric metric to evaluate resynthesis results

13