Download pdf - Improvised Symbolic Interaction - Ircamrepmus.ircam.fr/_media/dyci2/leyden_conf.pdf · Partiels by Gerard Grisey, 1975! • Historical example considered at the origin of “Spectral

Improvised Symbolic Interaction A Creative Perspective on Similarity

!

!

!

!

!

Gérard Assayag IRCAM

Sciences and Technologies of Music and Sound (STMS) Lab Music Representation Team

!!!!

What are MIR / Sim studies useful to (in a creative perspective)

• keywords : find, retrieve, match, identify, align, query / compose, generate, improvise, interact, synthesize, replace, recombine

• in a creative perspective, off-line MIR used for composition, on-line MIR used for direct interaction (Machine Improvisation)

• Going from on line MIR to Machine Audition (artificial listening) to Machine cognition (Machine Musicianship, Rowe 2001)

• Equip autonomous creative agents with Machine Audition and Machine Cognition capabilities

• Escape from the Song (occidental pop song) paradigm : free form, composition, improvisation

Compositional Exemple : orchestral texture derived from a single sound spectral analysis

• Example 1 . Partiels by Gerard Grisey, 1975

• Historical example considered at the origin of “Spectral Music”

• The source is the low E note on the trombone. The transformation is an exploration of the spectral content of this note by the whole orchestra

• Litteral reconstruction or transformations inspired by electro-acoustics operators (distortion, ring-modulation, filtering etc.)

• Highlights the need for Computer Assisted Composition Environments

Orchestration Projet ANR Sample Orchestrator

PhD G. Carpentier, Dec. 2008 : Computational Approach of

Musical OrchestrationOrchis!(client)!

Orchidée!(server)!

OrchDB!(database)! Multi-Objective Optimization

PhD G. Carpentier, Dec. 2008 : Computational Approach of Musical Orchestration

PhD P. Esling 2013 Multi-Objective Time Series Matching

Dynamic Musical Orchestration Using Genetic Algorithms and a Spectro-Temporal Description of Musical Instruments, Appl. Of Evolutionary Apps., LNCS 2010 Vol 6025/2010 Time series data mining and analysis, ACM Computing Surveys 2011 (Accepted proof). Intelligent sound samples database with multiobjective time series matching", IEEE Transactions on Speech Audio and Language Processing 2012

A complex sound target

Getting it from scratch

5

The OMax Project featuring the OMax Brothers

Marc Chemillier

Georges Bloch

Gérard Assayag

Benjamin Lévy

Shlomo Dubnov

Laurent Bonnasse-Gahot

Jérôme NikaFivos Maniatakos

Style Modeling

Markov LZ-IP/PSTDubnov, Assayag al.

[1998-2003]

…………….

Conklin [2003]

OMax Paradigm

• Listen, learn, model,

generate loop

• Assayag, Chemillier, Bloch

(2003)

Factor Oracle

7

The OMax Paradigm & family

MIMIFrançois [2007]

Native AlienBhagwati [2011]

PyOracleSurges & Dubnov [2013]

ContinuatorPachet [2002-2013]

…………

OM-LZ Assayag 2000 ImproTeK

Chemillier & Nika 2013

OMax Assayag, Chemillier, Bloch, 2004-2013

Lévy PhD 2013

SoMax Bonnasse-Gahot, Assayag, Bloch

2013

Impro Control Donzé, Seshia, Wessel

(2014)

Dubnov, Assayag, Cont 2007

Dubnov, Assayag, Cont 2007 - 2012

Audio Oracle / Information Rate Information Geometry

Formal grammars for jazz impro Chemillier 2000

http://matralab.hexagram.ca/?post_type=people&p=17144

A Virtual Musician who Learns on the fly

• Before learning one must listen : efficient machine listening

‣ perception aware signal segmentation into musical units distributed on some geometry

‣ cognition aware discovery of an alphabet structure on the musical units delivering a symbolic stream

• Learn incrementally a stylistic sequence model into a formal representation directly from the stream of symbols

• Generate and render new sequences by navigating the model

• These 3 processes (Listen, Learn, Generate) are real-time and concurrent (competitive and cooperative) : a unit played by the musician is recognized and integrated in the model after a few milliseconds (close to human performance)

8

Improvising subject

PresentPast (memory)

Modeling Improvised Interaction as Stylistic Reinjection

Fellow Improvizers

Listening

Learned Repertoire

Evoking, predicting

Self ListeningFellow (avatar)

Subject (avatar)

Memorizing -- Modeling— Compressing

OMax reifies the virtual past

IPG based on [Lempel,Ziv,78] •Dict = {} ; S : Sequence

• While S ≠ ε • S = pu with p = shortest prefix(S) ∉ Dict • Dict = Dict U {p} • S = u add shortest prefix not already in dict and move forward

Optimal coding: average code length for contextes converges to entropy of the source !

Lim∞ c(n) log2(c(n)) / n = H !

n : length, c(n) nb of contextes, log2(c(n)) average length of code words !Universal Predictor : adaptively combines predictability of Markovian Models with increasing orders, converges to an optimal coding without knowing a-priori the statistical model of the source Outperforms any fixed-order Markov predictor.

Variable Memory (adap0ve) Markov Models Context-‐based methods in sta0s0cal learning

a b a b a c a b a a b c

Dict = {a, b, ab, ac, aba, abc}

a

b c

c a

bContext

Prédiction (continuation) with implicit probability distr. Shape Crea0on through Context -‐ Predic0on

equivalent to naviga0on into compressed representa0on

PST : Retain continuation t for abc iff P(t | abc) > Pmin P(t | abc) sign. different from P(t | bc)

(Papentin L2 level complexity) The lower the L2 complexity, the more generative

Geraint : Similarity related to Predic0on Daniel Muellensiefen : compression distance performs well in evalua0ng similarity

Moog pianobar Key detection

Listening Strategies

Slicer Unslicer

Monodic signal : pitch follower (Yin~), or Midi Helper

Polyphonic Midi Processing

Yin~ quality function

Arbitrary complex signal : audio descriptors

• stream of frame-wise perceptual spectral descriptors (MFCC)

• adaptive quantization of descriptor vectors • local grouping / averaging -> musical tokens

• adaptive KNN -> symbolic alphabetChromagrams

Information Geometry

1

2

46

170

518

42

45

62 63

227

83

461

469

169

250

460

462

521

614

463

468

517

549

3

171

334

519

4

172

335

5

173

6

174

337

7

175

8

176

189

9

177

190

492

493

10

178

210

560

594

493

428

552

645

11

182

185

548

595

607

609

44

209

463

550

428

552

553

427

12

179

540

634

494

13

184

495

180

14

181

496

15

497

182

16

183

17

184

498

18

499

17

185

19

500

186 187

20 21

188

501

22

491

502

21

189

23 24

192

25

505

26

194

27

505

26

485

28

194

485

486

506

565 194

507

29

487

30

489

31 32

339

33

198

340

34

199

341

35

200

201

510

511

555

556

564

36

200

201

25

37

512

202

204

511

555

566

• Machine Listening in an Information geometry Framework : Distributing Sound Frames over Riemannian Manifolds with information metrics over parametric exponential probability spaces

• Points are probability distributions (approxed as frequency domain descriptors)

• Distances are Bregman Divergences : amount to relative entropy between perceptual descriptors

• Bregman Balls cluster points into stable musical units abstracted as symbols in a formal language alphabet

Cont A., Dubnov S., and Assayag G., On the Information Geometry of Audio Streams with Applications to Similarity Computing, IEEE Transactions on Audio, Speech and Language Processing, Vol. 19, no. 4, Pp. 837-846, May 2011.

χ₁

χ₂χ₃

χ₄

χ₅

v₁

v₃v₂

v₄v₅

1

2

3

4

5

χ₁

v₁

χ₅v₅χ₁

v₁Mutation

Euclidean Space

Factor Oracle + Suffix Link Trees

A A B B C C D D A B

C C

1

2

46

170

518

42

45

62 63

227

83

461

469

169

250

460

462

521

614

463

468

517

549

3

171

334

519

4

172

335

5

173

6

174

337

7

175

8

176

189

9

177

190

492

493

10

178

210

560

594

493

428

552

645

11

182

185

548

595

607

609

44

209

463

550

428

552

553

427

12

179

540

634

494

13

184

495

180

14

181

496

15

497

182

16

183

17

184

498

18

499

17

185

19

500

186 187

20 21

188

501

22

491

502

21

189

23 24

192

25

505

26

194

27

505

26

485

28

194

485

486

506

565 194

507

29

487

30

489

31 32

339

33

198

340

34

199

341

35

200

201

510

511

555

556

564

36

200

201

25

37

512

202

204

511

555

566

Listen + Learn : Factor Oracle and Suffix Link Tree Signal / Symbolic articulation

A. Cont, S. Dubnov Info-Geo + Similarity on Beethoven’s First Piano Sonata played by Gulda in 1958

Suffix Link Trees form a forest of trees fully explaining the algebraic partial order of patterns in learned sequence

OMax versatility

Dr. Dre &Snoop Dogg

Rhythm and pulse derive from the syllable and prosodic level analysis of similarities

Rhythm and pulse derive from off-line learning of the multi-track (beat aligned) studio sessions

Going SoMax : listen carefully

Durational accent; see Parncutt, 1994

(ANR SOR2, Post-doc L. Bonnasse-Gahot)

time (s)

pitc

h

0 2 4 6 8 10 12

C2

C3

C4

C5

C6

Toiviainen & Krumhansl, 2003, Echoic memory and leaky integration

Live input

Musical Memory

Annotation Layer

Phase Layer

Listen to several channels (foreground, background, voices etc.), One for the main memory model, the others for automatic annotation Create e.g. a solo memory model plus loose harmonic / textural annotation, or the other way round, or both. At generation time, match annotation with features extracted from the input

Going SoMax : Autumn in Köln

time (s)

pitc

h

0 2 4 6 8 10 12

C2

C3

C4

C5

C6

Memory Activation Scheme

Fuzzy pattern escape from the purely markovian sequence logic

Addressing the cartographical blindness, the evidence accumulation, and the cognitive persistence questions

An

P

SoMax

Activation at Time t

Activation at Time t+i

A+BB

A+B+C

Summing up activation profiles of parallel annotation views including self listening

Self Listening : listening to the memory, or to the generation

ABC

Beat/Phase Profile

Flexible Time

SoMax

In the mood of Time Remembered (Rémi Fox + Bill Evans) 1 agent trained on Bill Evans music. note/note, flexible time adjustment, melodic listening Right anticipation at 01:20

Schoenberg revisited 1 agent A1 trained off-line on corpus of Schoenberg’s Drei Klavierstücke, Op. 11. A1 improvises with melodic listening to Rémi’s Impro. 2nd agent A2 learns on the fly from Rémi.s impro audio stream, with additional harmonic view coming from A1’s impro. In second part, (2:36)A1 and A2 improvise together : A2 listens to A1’s harmony, A1 listens to A2’s melody. Rémi and Laurent get back into the game.