35
Natural Language Processing: An Overview Hang Li Noah’s Ark Lab Huawei Technologies Peking University April 18, 2017

Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Natural Language Processing: An Overview

Hang Li

Noah’s Ark Lab

Huawei Technologies

Peking University April 18, 2017

Page 2: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Talk Outline

• Introduction to Natural Language Processing

• State-of-the-Art Technologies of Natural Language Processing

• Future Trends of Natural Language Processing

Page 3: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Ultimate Goal: Natural Language Understanding

Text Comprehension Natural Language Dialogue

Page 4: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Natural Language Understanding

• Two definitions:

– Representation-based: if system creates proper internal representation, then we say it “understands” language

– Behavior-based: if system properly follows instruction in natural language, then we say it “understands” language, e.g., “bring me a cup of tea”

• We take the latter definition

Page 5: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Five Characteristics of Human Language

• Incompletely Regular (Both Regular and Idiosyncratic)

• Compositional (or Recursive)

• Metaphorical

• Associated with Knowledge

• Interactive

Page 6: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Natural Language Understanding by Computer Is Extremely Difficult

• It is still not clear whether it is possible to realize human language ability on computer

• On modern computer

– The incomplete regularity and compositionality characteristics imply complex combinatorial computation

– The metaphor, knowledge, and interaction characteristics imply exhaustive computation

• Big question: can we invent new computer closer to human brain?

Page 7: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Reason of Challenge

• A computer system must be constructed based on math

• Open question: whether it is possible to process natural language as humans, using math models

• Natural language processing is believed to be AI complete

Page 8: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Simplified Problem Formulation - Eg., Question Answering

Generation

Decision

Retrieval

Inference

Understanding

Analysis

Generation

Retrieval

Analysis

Question answering, including search, can be practically performed, because it is simplified

Page 9: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Data-driven Approach Works

• Hybrid is most realistic and effective for natural language processing, and AI

– machine learning based

– human-knowledge incorporated

– human brain inspired

• Big data and deep learning provides new opportunity

Page 10: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

AI Loop

System

Users

Data

Algorithm

Advancement in AI, including NLP can be made through the closed loop

Page 11: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Fundamental Problems of Statistical Natural Language Processing

• Classification: assigning a label to a string

• Matching: matching two strings

• Translation: transforming one string to another

• Structured prediction: mapping string to structure

• Markov decision process: deciding next state given previous state and action

ts

Rts,

cs

'ss

D

Page 12: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Fundamental Problems of Statistical Natural Language Processing

• Classification

– Text classification

– Sentiment analysis

• Matching

– Search

– Question answering

– Dialogue (single turn)

• Translation

– Machine translation

– Speech recognition

– Hand writing recognition

– Dialogue (single turn)

• Structured Prediction

– Named entity extraction

– Part of speech tagging

– Sentence parsing

– Semantic parsing

• Markov Decision Process

– Dialogue (multi turn, task dependent)

Page 13: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Lower Bound of User Need vs Upper Bound of Technology

Upper Bound of Technology

Lower Bound of User Need

Pushing Upper Bound of Technology

Page 14: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Talk Outline

• Introduction to Natural Language Processing

• State-of-the-Art Technologies of Natural Language Processing

• Future Trends of Natural Language Processing

Page 15: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Applications

• Question Answering

• Image Retrieval

• Single Turn Dialogue

• Machine Translation

Page 16: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Question Answering - DeepMatch CNN

Page 17: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Retrieval based Question Answering System

Index of Questions and

Answers

Matching

Ranking

Question

Retrieval

Retrieved Questions and Answers

Ranked Answers

Matching Models

Ranking Model

Online

Offline

Best Answer

Matched Answers

Page 18: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Deep Match Model CNN

• Represent and match two sentences simultaneously

• Two dimensional model

• State of art model for matching in question answering

18

MLP

Matching Degree

2D Convolution

More 2D Convolution & Pooling

Max-Pooling

1D Convolution

Sentence X

Sen

ten

ce Y

Page 19: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Image Retrieval - Multimodal CNN

Page 20: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Demo

Page 21: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Multimodal CNN

……

CNN

MLP

a dog is catching a ball

• One Convolutional Neural Network represents image • One Convolutional Neural Network represents text • Multi Layer Perceptron conducts matching

Page 22: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Natural Language Dialogue - Neural Responding Machine

Page 23: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Demo

Page 24: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Neural Responding Machine

1x1tx tx

Tx

1h 1th th Th

… …

1c1tc tc

'Tc

1s 1ts ts'Ts

1y 1ty ty'Ty

… …

c

太 羡慕 你 了 祝 旅行 愉快

每年 冬天 都 来 海南 度假

• Using both local and global attention mechanisms

Page 25: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Neural Machine Translation - Google Neural Machine Translation

Page 26: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Google Neural Machine Translation

• Sequence-to-Sequence Learning Model

• With 8 layer encoder, 8 layer decoder

• Residual connections and attention connections from bottom of decoder to top of encoder

• Model partition and data partition

• Use sub-word units for both input and output to deal with rare words

• Use length normalization and coverage penalty

Page 27: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Architecture of Google Neural Machine Translation

Page 28: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Talk Outline

• Introduction to Natural Language Processing

• State-of-the-Art Technologies of Natural Language Processing

• Future Trends of Natural Language Processing

Page 29: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Performances

Task Setting Problem Formulation

Accuracy

Automatic Speech Recognition

Ideal Environment

Translation 95%

Dialogue Single Turn Classification or Structured Prediction

80%-90%

Dialogue Multi Turn Markov Decision Process

50%-70%

Question Answering

Single Turn Matching 70%-80%

Machine Translation

Written Language Translation

Translation 70%-80% (derived from BLEU score)

Page 30: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Trend One: Speech Recognition and Translation Are Taking off

• Automatic Speech Recognition is being widely used in language input

• Written Language Translation will be more widely used in practice

• Spoken Language Translation will be gradually utilized and improved

• There are still issues to be solved, e.g., long tail challenge

Page 31: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Trend Two: Single Turn Dialogue and Single Turn Question Answering Will Take-off

• Task-dependent single turn dialogue will be gradually used

• Single turn question answering will be gradually used

• They can be extended to multi turn with heuristics

• Open question: is generation-based single turn dialogue practically useful?

Page 32: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Trend Three: Multi-Turn Dialogue Needs More Research

• Must be task-dependent

• Reinforcement Learning can be key technology

• Data needs to be collected first, and then the AI loop can be run

• Simple (not complex) task-dependent multi-turn dialogue will be realized

• Chatbot is very difficult, performance is not high with only single turn technologies used

Page 33: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Summary

• Natural Language Understanding is difficult

• Five fundamental problems in natural language processing

• AI loop is important

• Deep learning achieves state of the art performance, particularly for machine translation

• Speech recognition, translation, single turn dialogue, single turn question answering technologies will be continuously improved and gradually used in practice

Page 34: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

References

1. 李航,迎接自然语言处理新时代,计算机学会通讯,2017年第2期

2. 李航,简论人工智能,计算机学会通讯,2016年第3期

3. 李航,对于AI我们应该期待什么,计算机学会通讯,2016月第11期

4. 李航,技术的上界与需求的下界,新浪博客,2014年

5. Lin Ma, Zhengdong Lu, Lifeng Shang, Hang Li, Multimodal Convolutional Neural Networks for Matching Image and Sentence. ICCV’15, 2623-2631, 2015.

6. Baotian Hu, Zhaopeng Tu, Zhengdong Lu, Hang Li, Qingcai Chen. Context-Dependent Translation Selection Using Convolutional Neural Network. ACL-IJCNLP'15, 536-541, 2015.

7. Lifeng Shang, Zhengdong Lu, Hang Li. Neural Responding Machine for Short Text Conversation. ACL-IJCNLP'15, 1577-1586, 2015.

8. Yonghui Wu, Mike Schuster, Zhifeng Chen, Quov Le, Mohammad Norouzi, et al., Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, 2016.

Page 35: Natural Language Processing - Hang Li · Natural Language Processing: An Overview Hang Li Noah’s Ark Lab ... •Written Language Translation will be more widely used in practice

Thank you!