27
黄瓒 深度学习解决方案架构师@Nvidia 基于 TACOTRON2 WAVEGLOW 端到端语音合成加速方案

基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

黄瓒 深度学习解决方案架构师@Nvidia

基于TACOTRON2和WAVEGLOW的端到端语音合成加速方案

Page 2: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

2

背景

基于 Tacotron2 和 WaveGlow 的端到端语音合成概述

声码器

介绍 WaveGlow 一种基于深度神经网络的声码器

加速方案

结合 Tacotron2 使用 TensorRT 在 Nvidia GPU 上加速模型推理

AGENDA

Page 3: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

3

背景

Page 4: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

4

语音合成Text-to-Speech

语音合成语音识别

• 智能家居• 会议记录• 内容检索• 指令识别• 实时翻译• ...

• 车载导航• 电话客服• 虚拟偶像• 有声小说• 睡前故事• ...

技术驱动的,更自然、高效的人机交互方式

Page 5: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

5

端到端?

文本: 苏州是个美丽的城市!

复杂的处理过程由单个模型完成,降低语音合成准入门槛=>数据+算力≈?

通过深度神经网络做到更好的语音合成效果=>音质提升,触达更多场景

Page 6: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

6

一分为二

特征预测(Tacotron2)

• 字符/音素->梅尔频谱

声码器(WaveGlow)

• 梅尔频谱->声波

字符/音素 序列

声波

中间表示(梅尔频谱)

Page 7: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

7

声码器

Page 8: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

8

较好的音质+更快的速度?

采样率高: 16KHz=OK, 22KHz=GOOD, 24KHz=BETTER

时序依赖性强: 部分自回归神经网络方法需要若干小时生成十几秒语音

在算法设计上减少自回归结构,增强可并行性->用卷积层完成更多任务

充分发挥硬件性能,针对特定平台做定向优化降低延迟,提高吞吐

Page 9: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

9

WAVEGLOW

生成模型?• 对抗生成网络(GAN)• 变分自编码器(VAE)• 基于流的方法(Flow-Based)

声码器?• 传统信号处理方法• 基于神经网络构建

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

https://openai.com/blog/glow/

Page 10: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

10

基于流的生成模型

https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html

Page 11: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

11

最大似然

Page 12: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

12

雅可比矩阵

https://en.wikipedia.org/wiki/Jacobian_matrix_and_determinant

Page 13: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

13

变换变量定理

Flow-based Generative Model by 李宏毅

Page 14: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

14

求逆变换的雅可比矩阵行列式

Flow-based Generative Model by 李宏毅

Page 15: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

15

GLOW

https://arxiv.org/abs/1807.03039

Page 16: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

16

WAVENET

https://arxiv.org/abs/1609.03499

Page 17: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

17

WAVEGLOW

https://arxiv.org/pdf/1811.00002.pdf

Page 18: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

18

训练

混合精度训练 微调预训练模型

声音数据

Page 19: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

19

推理

https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/

• 提取权重• 构建网络• 生成Plan• FP32->FP16

• 在线推理

Page 20: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

20

加速方案

Page 21: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

21

TACOTRON2

https://arxiv.org/pdf/1712.05884.pdf

Page 22: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

22

TACOTRON2

Decoder 部分 GPU 函数过于细碎,成为性能瓶颈

Page 23: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

23

加速 TACOTRON2

TensorRT支持的层直接转到对应实现

Page 24: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

24

加速 TACOTRON2

模型中的其它层通过插件形式接入并实现

在 C++/CUDA 代码的层级做层融合和特定优化工作

Credits to Nvidia DevTech Team for optimizing Tacotron2 on GPU

Page 25: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

25

目前取得的加速效果

Tacotron2+WaveGlow on V100

•原始实现:低于十倍实时

•加速后:高于五十倍实时

Accelerate for Deployment

Page 26: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

26

引用

J. Shen et al., "Natural TTS Synthesis by Conditioning Wavenet on MEL Spectrogram Predictions," 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, 2018, pp. 4779-4783.

R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, 2019, pp. 3617-3621.

Oord A, Dieleman S, Zen H, et al. Wavenet: A generative model for raw audio[J]. arXiv preprint arXiv:1609.03499, 2016.

Kingma D P, Dhariwal P. Glow: Generative flow with invertible 1x1 convolutions[C]//Advances in Neural Information Processing Systems. 2018: 10215-10224.

https://lilianweng.github.io/lil-log/2018/10/13/flow-based-deep-generative-models.html

https://docs.nvidia.com/deeplearning/sdk/tensorrt-api/python_api/index.html

https://www.youtube.com/watch?v=uXY18nzdSsM

Page 27: 基于TACOTRON2和WAVEGLOW的 端到端语音合成加速方案 ...R. Prenger, R. Valle and B. Catanzaro, "Waveglow: A Flow-based Generative Network for Speech Synthesis," ICASSP 2019

谢谢!