MuseGAN: Multi-track Sequential Generative Adversarial ......MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment Hao-Wen

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentHao-Wen Dong,* Wen-Yi Hsiao,* Li-Chia Yang, Yi-Hsuan YangMusic and AI Lab, Research Center of IT Innovation, Academia Sinica

*these authors contributed equally to this work

Outlines。Goals & Challenges。Data。Proposed Model。Results & Evaluation。Recent Work。Future Works

2

Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/

https://github.com/salu133445/museganhttps://salu133445.github.io/musegan/

Goals & Challenges

[Source Code]https://github.com/salu133445/musegan

[Demo Page]https://salu133445.github.io/musegan/

Goals

Generate pop music。of multiple tracks

。in piano-roll format

。using GAN with CNNs4

Challenge IMultitrack Interdependency

5

vocal

piano

bass

drums

strings

music & clip by phycause

Multi-track GAN

Challenge IIMusic Texture

6

melody

chord(harmony)

Convolutional Neural Networks

Challenge IIITemporal Structure

7

paragraph 1 paragraph 2 paragraph 3

phrase 1 phrase 2 phrase 3 phrase 4

bar 1 bar 2 bar 3 bar 4

beat 1 beat 2 beat 3 beat 4

step 1 step 2 ··· step 24

song

phrase 2

4/4 time

Challenge IIITemporal Structure

8




Fixed Structure

4/4 time

Convolutional Neural Networks

Data Representation

10

pitch

time

Piano-rollBar 1 Bar 2 Bar 3 Bar 4

polyphonic multi-track

time step

(with symbolic timing)

Data Representation

11

pitch

time

Piano-rollBar 1 Bar 2 Bar 3 Bar 4


(with symbolic timing)A3

t0 t1

Data Representation

12

Multi-track Piano-roll

pitch

time

tracks


(with symbolic timing)

Data Representation

13

96 time steps

84pitches 5 tracks

4 bars

a 4×96×84×5 tensor

Drums

GuitarPiano

Strings

Bass

Data

LPD (Lakh Pianoroll Dataset)。>170,000 multi-track piano-rolls。Derived from Lakh MIDI Dataset。Mainly pop songs

Pypianoroll (Python package)。Manipulation & Visualization。Efficient I/O。Parse/Write MIDI files。On PYPI (pip installable)

14

[Dataset]https://salu133445.github.io/lakh-pianoroll-dataset

[Pypianoroll]https://salu133445.github.io/pypianoroll/

Proposed Model

D 1/0

real samples

Gz~pz G(z)

random noise fake samples

Generative Adversarial Networks

16

X~pX

D 1/0

log(1-D(X)) + log(D(G(z)))

log(1-D(G(z)))

GeneratorMake G(z) indistinguishable

from real data for DDiscriminator

Tell G(z) as fake data from X being real ones

real samples

Gz~pz G(z)



17

X~pX

D 1/0

log(1-D(X)) + log(D(G(z)))

log(1-D(G(z)))

GeneratorMake G(z) indistinguishable

from real data for DDiscriminator

Tell G(z) as fake data from X being real ones

real samples

Gz~pz G(z)



18

X~pX

4-bar phrases of 5 tracks

MuseGAN – An Overview

19

Gtemp

4 latent variables1 random noise

TemporalGenerator

BarGenerator

4 piano-roll matrices

Gbar

Bar Generator

Generator

20

zzzzz

zzzz

zzzz

GGGGG

Generator

21

z

Bar Generator

zzzzzzzzz

zzzz

GGGGGNo Coordination

Coordination

track-dependent

track-independent

zzzzz

Generator

22

z

Bar GeneratorGz

GGGGG

zzzz

zzzzzzzzz

zzzz

GGGGG

Temporal Generator

zzzzz

MuseGAN

23

z

Bar GeneratorGz

GGGGG

zzzz

zzzzzzzzz

zzzz

GGGGG

TimeDependent Independent

TrackDependent Melody Groove

Independent Chords Style

MuseGAN

24

Network Architectures

25

zzzzz

z

Bar GeneratorGz

GGGGG

zzzz

zzzzzzzzz

zzzz

GGGGG

Input 32dense 1024reshape to 2, 1 × 512 channels 2, 1, 512transconv 512 2 × 1 2, 1 4, 1, 512transconv 256 2 × 1 2, 1 8, 1, 256transconv 256 2 × 1 2, 1 16, 1, 256transconv 128 2 × 1 2, 1 32, 1, 128transconv 128 3 × 1 3, 1 96, 1, 128transconv 64 1 × 7 1, 7 96, 7, 64transconv 𝑀𝑀 1 × 12 1, 12 96, 84,𝑀𝑀Output 96, 84 × 𝑀𝑀 channels

Input 32transconv 1024 2 1 3, 1024transconv 32 3 1 4, 32Output 32

Temporal Generator

Results

Results

27

More samples available on demo pagehttps://salu133445.github.io/musegan/

Sample 1 Sample 2

BassDrumsGuitarStringsPiano

Step 0 Step 700 Step 2500 Step 7900

Drum pattern

Chords

Bass lineBass

Drums

Guitar

Strings

Piano

https://salu133445.github.io/musegan/

step2000 4000 6000 800010

4

106

108

1010

1012

0

Negative Critic Loss

Neg

ativ

e C

ritic

Los

s

Objective Metrics

28

# of Used Pitch Classes

# of

Use

d Pi

tch

Cla

sses

step

Qualified Note Rate

Qua

lifie

d N

ote

Rat

e

step

Monitor the Training

User Study

29

H: harmoniousR: rhythmicMS: musically structuredC: coherentOR: overall rating

composer jamming hybrid

Accompaniment System

30

Generation from Scratch nothing 5-track

Accompaniment System single-track 5-track

Conditional GAN

Summary。MuseGAN

。a novel GAN for multi-track sequence generation。multi-track, polyphonic music。human-AI cooperative scenario

。Lakh Pianoroll Dataset (LPD) (new dataset)

。Pypianoroll (new Python package)

31

Recent Work

Known Issue。Naïve binarization methods can easily lead to

overly-fragmented notes

33

raw

Bernoulli sampling

hardthresholding

BinaryMuseGAN。use binary neurons at the output layer of the generator。use straight-through estimator to estimate the gradients for the

binary neurons (which involves nondifferentiable operation)

34

Generator’s outputs Real data

MuseGAN real-valued binary-valued

BinaryMuseGAN binary-valued binary-valued

Hao-Wen Dong and Yi-Hsuan Yang, "Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation," to appear at ISMIR, 2018.

Qualitative Comparison

35

raw

pretrained(+BS)

pretrained(+HT)

proposed(+SBNs)

proposed(+DBNs)

raw

proposed (+SBNs) proposed (+DBNs)

pretrained (+HT)pretrained (+BS)

Future Works

Future Works

Full Song GenerationChallenges。hierarchical temporal structure。variable-length sequence generation

37




phrase 2

paragraph 1 paragraph 2 paragraph 3

phrase 1 phrase 2 phrase 3 phrase 4

song

Future Works

Cross-modal GenerationChallenge。cross-modal temporal interdependency

Applications in Music。music + lyrics

。music + video

38

Q&AMuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentHao-Wen Dong,* Wen-Yi Hsiao,* Li-Chia Yang, Yi-Hsuan Yang

Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/

zzzzz

z

Bar GeneratorGz

GGGGG

zzzz

zzzzzzzzz

zzzz

GGGGG

Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation Hao-Wen Dong and Yi-Hsuan Yang

https://github.com/salu133445/museganhttps://salu133445.github.io/musegan/

MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentOutlinesGoals & ChallengesGoalsChallenge I�Multitrack InterdependencyChallenge II�Music TextureChallenge III�Temporal StructureChallenge III�Temporal StructureDataData RepresentationData RepresentationData RepresentationData RepresentationDataProposed ModelGenerative Adversarial NetworksGenerative Adversarial NetworksGenerative Adversarial NetworksMuseGAN – An OverviewGeneratorGeneratorGeneratorMuseGANMuseGANNetwork ArchitecturesResultsResultsObjective MetricsUser StudyAccompaniment SystemSummaryRecent WorkKnown IssueBinaryMuseGANQualitative ComparisonFuture WorksFuture WorksFuture WorksQ&A

Documents

MuseGAN: Multi-track Sequential Generative Adversarial ......MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment Hao-Wen