39
MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment Hao-Wen Dong,* Wen-Yi Hsiao,* Li-Chia Yang, Yi-Hsuan Yang Music and AI Lab, Research Center of IT Innovation, Academia Sinica *these authors contributed equally to this work

MuseGAN: Multi-track Sequential Generative Adversarial ......MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and Accompaniment Hao-Wen

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

  • MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentHao-Wen Dong,* Wen-Yi Hsiao,* Li-Chia Yang, Yi-Hsuan YangMusic and AI Lab, Research Center of IT Innovation, Academia Sinica

    *these authors contributed equally to this work

  • Outlines。Goals & Challenges。Data。Proposed Model。Results & Evaluation。Recent Work。Future Works

    2

    Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/

    https://github.com/salu133445/museganhttps://salu133445.github.io/musegan/

  • Goals & Challenges

  • [Source Code]https://github.com/salu133445/musegan

    [Demo Page]https://salu133445.github.io/musegan/

    Goals

    Generate pop music。of multiple tracks

    。in piano-roll format

    。using GAN with CNNs4

  • Challenge IMultitrack Interdependency

    5

    vocal

    piano

    bass

    drums

    strings

    music & clip by phycause

    Multi-track GAN

  • Challenge IIMusic Texture

    6

    melody

    chord(harmony)

    Convolutional Neural Networks

  • Challenge IIITemporal Structure

    7

    paragraph 1 paragraph 2 paragraph 3

    phrase 1 phrase 2 phrase 3 phrase 4

    bar 1 bar 2 bar 3 bar 4

    beat 1 beat 2 beat 3 beat 4

    step 1 step 2 ··· step 24

    song

    phrase 2

    4/4 time

  • Challenge IIITemporal Structure

    8

    bar 1 bar 2 bar 3 bar 4

    beat 1 beat 2 beat 3 beat 4

    step 1 step 2 ··· step 24

    Fixed Structure

    4/4 time

    Convolutional Neural Networks

  • Data

  • Data Representation

    10

    pitch

    time

    Piano-rollBar 1 Bar 2 Bar 3 Bar 4

    polyphonic multi-track

    time step

    (with symbolic timing)

  • Data Representation

    11

    pitch

    time

    Piano-rollBar 1 Bar 2 Bar 3 Bar 4

    polyphonic multi-track

    (with symbolic timing)A3

    t0 t1

  • Data Representation

    12

    Multi-track Piano-roll

    pitch

    time

    tracks

    polyphonic multi-track

    (with symbolic timing)

  • Data Representation

    13

    96 time steps

    84pitches 5 tracks

    4 bars

    a 4×96×84×5 tensor

    Drums

    GuitarPiano

    Strings

    Bass

  • Data

    LPD (Lakh Pianoroll Dataset)。>170,000 multi-track piano-rolls。Derived from Lakh MIDI Dataset。Mainly pop songs

    Pypianoroll (Python package)。Manipulation & Visualization。Efficient I/O。Parse/Write MIDI files。On PYPI (pip installable)

    14

    [Dataset]https://salu133445.github.io/lakh-pianoroll-dataset

    [Pypianoroll]https://salu133445.github.io/pypianoroll/

  • Proposed Model

  • D 1/0

    real samples

    Gz~pz G(z)

    random noise fake samples

    Generative Adversarial Networks

    16

    X~pX

  • D 1/0

    log(1-D(X)) + log(D(G(z)))

    log(1-D(G(z)))

    GeneratorMake G(z) indistinguishable

    from real data for DDiscriminator

    Tell G(z) as fake data from X being real ones

    real samples

    Gz~pz G(z)

    random noise fake samples

    Generative Adversarial Networks

    17

    X~pX

  • D 1/0

    log(1-D(X)) + log(D(G(z)))

    log(1-D(G(z)))

    GeneratorMake G(z) indistinguishable

    from real data for DDiscriminator

    Tell G(z) as fake data from X being real ones

    real samples

    Gz~pz G(z)

    random noise fake samples

    Generative Adversarial Networks

    18

    X~pX

    4-bar phrases of 5 tracks

  • MuseGAN – An Overview

    19

    Gtemp

    4 latent variables1 random noise

    TemporalGenerator

    BarGenerator

    4 piano-roll matrices

    Gbar

  • Bar Generator

    Generator

    20

    zzzzz

    zzzz

    zzzz

    GGGGG

  • Generator

    21

    z

    Bar Generator

    zzzzzzzzz

    zzzz

    GGGGGNo Coordination

    Coordination

    track-dependent

    track-independent

  • zzzzz

    Generator

    22

    z

    Bar GeneratorGz

    GGGGG

    zzzz

    zzzzzzzzz

    zzzz

    GGGGG

    Temporal Generator

  • zzzzz

    MuseGAN

    23

    z

    Bar GeneratorGz

    GGGGG

    zzzz

    zzzzzzzzz

    zzzz

    GGGGG

    TimeDependent Independent

    TrackDependent Melody Groove

    Independent Chords Style

  • MuseGAN

    24

  • Network Architectures

    25

    zzzzz

    z

    Bar GeneratorGz

    GGGGG

    zzzz

    zzzzzzzzz

    zzzz

    GGGGG

    Input 32dense 1024reshape to 2, 1 × 512 channels 2, 1, 512transconv 512 2 × 1 2, 1 4, 1, 512transconv 256 2 × 1 2, 1 8, 1, 256transconv 256 2 × 1 2, 1 16, 1, 256transconv 128 2 × 1 2, 1 32, 1, 128transconv 128 3 × 1 3, 1 96, 1, 128transconv 64 1 × 7 1, 7 96, 7, 64transconv 𝑀𝑀 1 × 12 1, 12 96, 84,𝑀𝑀Output 96, 84 × 𝑀𝑀 channels

    Input 32transconv 1024 2 1 3, 1024transconv 32 3 1 4, 32Output 32

    Temporal Generator

  • Results

  • Results

    27

    More samples available on demo pagehttps://salu133445.github.io/musegan/

    Sample 1 Sample 2

    BassDrumsGuitarStringsPiano

    Step 0 Step 700 Step 2500 Step 7900

    Drum pattern

    Chords

    Bass lineBass

    Drums

    Guitar

    Strings

    Piano

    https://salu133445.github.io/musegan/

  • step2000 4000 6000 800010

    4

    106

    108

    1010

    1012

    0

    Negative Critic Loss

    Neg

    ativ

    e C

    ritic

    Los

    s

    Objective Metrics

    28

    # of Used Pitch Classes

    # of

    Use

    d Pi

    tch

    Cla

    sses

    step

    Qualified Note Rate

    Qua

    lifie

    d N

    ote

    Rat

    e

    step

    Monitor the Training

  • User Study

    29

    H: harmoniousR: rhythmicMS: musically structuredC: coherentOR: overall rating

    composer jamming hybrid

  • Accompaniment System

    30

    Generation from Scratch nothing 5-track

    Accompaniment System single-track 5-track

    Conditional GAN

  • Summary。MuseGAN

    。a novel GAN for multi-track sequence generation。multi-track, polyphonic music。human-AI cooperative scenario

    。Lakh Pianoroll Dataset (LPD) (new dataset)

    。Pypianoroll (new Python package)

    31

  • Recent Work

  • Known Issue。Naïve binarization methods can easily lead to

    overly-fragmented notes

    33

    raw

    Bernoulli sampling

    hardthresholding

  • BinaryMuseGAN。use binary neurons at the output layer of the generator。use straight-through estimator to estimate the gradients for the

    binary neurons (which involves nondifferentiable operation)

    34

    Generator’s outputs Real data

    MuseGAN real-valued binary-valued

    BinaryMuseGAN binary-valued binary-valued

    Hao-Wen Dong and Yi-Hsuan Yang, "Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation," to appear at ISMIR, 2018.

  • Qualitative Comparison

    35

    raw

    pretrained(+BS)

    pretrained(+HT)

    proposed(+SBNs)

    proposed(+DBNs)

    raw

    proposed (+SBNs) proposed (+DBNs)

    pretrained (+HT)pretrained (+BS)

  • Future Works

  • Future Works

    Full Song GenerationChallenges。hierarchical temporal structure。variable-length sequence generation

    37

    bar 1 bar 2 bar 3 bar 4

    beat 1 beat 2 beat 3 beat 4

    step 1 step 2 ··· step 24

    phrase 2

    paragraph 1 paragraph 2 paragraph 3

    phrase 1 phrase 2 phrase 3 phrase 4

    song

  • Future Works

    Cross-modal GenerationChallenge。cross-modal temporal interdependency

    Applications in Music。music + lyrics

    。music + video

    38

  • Q&AMuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentHao-Wen Dong,* Wen-Yi Hsiao,* Li-Chia Yang, Yi-Hsuan Yang

    Source Code https://github.com/salu133445/museganDemo Page https://salu133445.github.io/musegan/

    zzzzz

    z

    Bar GeneratorGz

    GGGGG

    zzzz

    zzzzzzzzz

    zzzz

    GGGGG

    Convolutional Generative Adversarial Networks with Binary Neurons for Polyphonic Music Generation Hao-Wen Dong and Yi-Hsuan Yang

    https://github.com/salu133445/museganhttps://salu133445.github.io/musegan/

    MuseGAN: Multi-track Sequential Generative Adversarial Networks for Symbolic Music Generation and AccompanimentOutlinesGoals & ChallengesGoalsChallenge I�Multitrack InterdependencyChallenge II�Music TextureChallenge III�Temporal StructureChallenge III�Temporal StructureDataData RepresentationData RepresentationData RepresentationData RepresentationDataProposed ModelGenerative Adversarial NetworksGenerative Adversarial NetworksGenerative Adversarial NetworksMuseGAN – An OverviewGeneratorGeneratorGeneratorMuseGANMuseGANNetwork ArchitecturesResultsResultsObjective MetricsUser StudyAccompaniment SystemSummaryRecent WorkKnown IssueBinaryMuseGANQualitative ComparisonFuture WorksFuture WorksFuture WorksQ&A