28
Noriko Tomuro 1 CSC 578 Neural Networks and Deep Learning 11. Neural Natural Language Processing (Overview)

CSC 578 Neural Networks and Deep Learning - DePaul University

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

CSC 578 Neural Networks and Deep Learning11. Neural Natural Language Processing (Overview)
1. Text Categorization using Neural Networks
Noriko Tomuro 2
• A simple example that uses a FeedForward network for sentiment analysis (logistic regression, where the output 1 means positive and 0 means negative).
“amazing” “visual” “effects”
Noriko Tomuro 3
• But this model ignores word order or dependency between words (because the model is essentially Bag-Of-Words (BOWs)).
• Since text and human languages are linear, Recurrent Neural Networks can model the dependency between words.
“amazing” “visual” “effects”
4
t - 2 t - 1 t
Speech and Language Processing (3rd ed. draft), Dan Jurafsky and James H. Martin 5
2. Vector Representation of Words and Text
Speech and Language Processing (3rd ed. draft), Dan Jurafsky and James H. Martin 6
Speech and Language Processing (3rd ed. draft), Dan Jurafsky and James H. Martin 7
Term Weighting (TFiDF)
Speech and Language Processing (3rd ed. draft), Dan Jurafsky and James H. Martin 8
Speech and Language Processing (3rd ed. draft), Dan Jurafsky and James H. Martin 9
Noriko Tomuro 10
“amazing” “effects” “visual”
0 10 1500 4000 |V|-1word index 0.02 0.18 0.3…. …. …. ….
0.006 0.0 0.30.61
V is the size of the vocabulary of the corpus.
3. Word Embedding
Noriko Tomuro 11
• “A word embedding is a learned representation for text where words that have the similar meaning have a similar representation.” (https://machinelearningmastery.com/what-are-word- embeddings/)
Context of a Word • Example: a window of ± 7 words
Noriko Tomuro 12
Word-word matrix
• The size of windows depends on your goals – The shorter the windows , the more syntactic the
representation – The longer the windows, the more semantic the
representation Noriko Tomuro 13
Word2vec
• Word2Vec is a statistical method for efficiently learning a standalone word embedding from a text corpus.
• It was developed by Tomas Mikolov, et al. at Google in 2013 as a response to make the neural-network- based training of the embedding more efficient.
• Instead of counting how often each word w occurs near "apricot"
• Train a (neural network) classifier on a binary prediction task: “Is w likely to show up near "apricot"?” This is called Language Modelling in NLP.
Noriko Tomuro 14
Noriko Tomuro 15
Noriko Tomuro 16
Noriko Tomuro 17
• Two different learning models were introduced that can be used as part of the word2vec approach to learn the word embedding: – Continuous Bag-of-Words, or CBOW model. – Continuous Skip-Gram Model.
• The CBOW model learns the embedding by predicting the current word based on its context.
• The continuous skip-gram model learns by predicting the surrounding words given a current word.
Noriko Tomuro 18
Noriko Tomuro 19
Embeddings in Word Prediction
Speech and Language Processing (3rd ed. draft), Dan Jurafsky and James H. Martin 22
Noriko Tomuro 23
Speech and Language Processing (3rd ed. draft), Dan Jurafsky and James H. Martin 24
25
model = Sequential() model.add(Embedding(1000, 64, input_length=10)) # the model will take as input an integer matrix of size (batch, input_length). # the largest integer (i.e. word index) in the input should be # no larger than 999 (vocabulary size). # now model.output_shape == (None, 10, 64), where None is the batch dimension.
input_array = np.random.randint(1000, size=(32, 10))
model.compile('rmsprop', 'mse’) output_array = model.predict(input_array) assert output_array.shape == (32, 10, 64)
Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]] So, this is used to learn embedding from scratch.
Code Example (1): The Embedding Layer in Keras
Arguments •input_dim: int > 0. Size of the vocabulary, i.e. maximum integer index + 1. •output_dim: int >= 0. Dimension of the dense embedding.
Code Example (2): Embedding in a FeedForward Network for Text Classification
Noriko Tomuro 26
model = keras.Sequential([ keras.layers.Embedding(encoder.vocab_size, 16), keras.layers.GlobalAveragePooling1D(), keras.layers.Dense(1, activation='sigmoid')])
1.The first layer is an Embedding layer. This layer takes the integer-encoded vocabulary and looks up the embedding vector for each word-index. These vectors are learned as the model trains. The vectors add a dimension to the output array. The resulting dimensions are: (batch, sequence, embedding).
2.Next, a GlobalAveragePooling1D layer returns a fixed-length output vector for each example by averaging over the sequence dimension. This allows the model to handle input of variable length, in the simplest way possible.
3.This fixed-length output vector is piped through a fully-connected (Dense) layer with 16 hidden units.
4.The last layer is densely connected with a single output node. Using the sigmoid activation function, this value is a float between 0 and 1, representing a probability, or confidence level.
Code Example (3): Embedding in a RNN Network for Text Classification
• With one Bidirectional layer
tf.keras.layers.Embedding(encoder.vocab_size, 64), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dropout(0.5), tf.keras.layers.Dense(1, activation='sigmoid')
• In Keras Documentation site: https://keras.io/examples/pretrained_word_embeddings/
Noriko Tomuro 28
1. Text Categorization using Neural Networks
Slide Number 3
Slide Number 5
Slide Number 7
Term Weighting (TFiDF)
Slide Number 9
3. Word Embedding
Code Example (1): The Embedding Layer in Keras
Code Example (2): Embedding in a FeedForward Network for Text Classification
Code Example (3): Embedding in a RNN Network for Text Classification
Code Example (4): Using Pre-trained Word Embeddings (GloVe)