CSC 578 Neural Networks and Deep Learning - DePaul University

CSC 578 Neural Networks and Deep Learning11. Neural Natural Language Processing (Overview)
1. Text Categorization using Neural Networks
Noriko Tomuro 2
• A simple example that uses a FeedForward network for sentiment analysis (logistic regression, where the output 1 means positive and 0 means negative).
“amazing” “visual” “effects”
Noriko Tomuro 3
• But this model ignores word order or dependency between words (because the model is essentially Bag-Of-Words (BOWs)).
• Since text and human languages are linear, Recurrent Neural Networks can model the dependency between words.
“amazing” “visual” “effects”
4
t - 2 t - 1 t
Speech and Language Processing (3rd ed. draft), Dan Jurafsky and James H. Martin 5
2. Vector Representation of Words and Text
Term Weighting (TFiDF)
Noriko Tomuro 10
“amazing” “effects” “visual”
0 10 1500 4000 |V|-1word index 0.02 0.18 0.3…. …. …. ….
0.006 0.0 0.30.61
V is the size of the vocabulary of the corpus.
3. Word Embedding
Noriko Tomuro 11
• “A word embedding is a learned representation for text where words that have the similar meaning have a similar representation.” (https://machinelearningmastery.com/what-are-word- embeddings/)
Context of a Word • Example: a window of ± 7 words
Noriko Tomuro 12
Word-word matrix
• The size of windows depends on your goals – The shorter the windows , the more syntactic the
representation – The longer the windows, the more semantic the
representation Noriko Tomuro 13
Word2vec
• Word2Vec is a statistical method for efficiently learning a standalone word embedding from a text corpus.
• It was developed by Tomas Mikolov, et al. at Google in 2013 as a response to make the neural-network- based training of the embedding more efficient.
• Instead of counting how often each word w occurs near "apricot"
• Train a (neural network) classifier on a binary prediction task: “Is w likely to show up near "apricot"?” This is called Language Modelling in NLP.
Noriko Tomuro 14
Noriko Tomuro 15
Noriko Tomuro 16
Noriko Tomuro 17
• Two different learning models were introduced that can be used as part of the word2vec approach to learn the word embedding: – Continuous Bag-of-Words, or CBOW model. – Continuous Skip-Gram Model.
• The CBOW model learns the embedding by predicting the current word based on its context.
• The continuous skip-gram model learns by predicting the surrounding words given a current word.
Noriko Tomuro 18
Noriko Tomuro 19
Embeddings in Word Prediction
Noriko Tomuro 23
25
model = Sequential() model.add(Embedding(1000, 64, input_length=10)) # the model will take as input an integer matrix of size (batch, input_length). # the largest integer (i.e. word index) in the input should be # no larger than 999 (vocabulary size). # now model.output_shape == (None, 10, 64), where None is the batch dimension.
input_array = np.random.randint(1000, size=(32, 10))
model.compile('rmsprop', 'mse’) output_array = model.predict(input_array) assert output_array.shape == (32, 10, 64)
Turns positive integers (indexes) into dense vectors of fixed size. eg. [[4], [20]] -> [[0.25, 0.1], [0.6, -0.2]] So, this is used to learn embedding from scratch.
Code Example (1): The Embedding Layer in Keras
Arguments •input_dim: int > 0. Size of the vocabulary, i.e. maximum integer index + 1. •output_dim: int >= 0. Dimension of the dense embedding.
Code Example (2): Embedding in a FeedForward Network for Text Classification
Noriko Tomuro 26
model = keras.Sequential([ keras.layers.Embedding(encoder.vocab_size, 16), keras.layers.GlobalAveragePooling1D(), keras.layers.Dense(1, activation='sigmoid')])
1.The first layer is an Embedding layer. This layer takes the integer-encoded vocabulary and looks up the embedding vector for each word-index. These vectors are learned as the model trains. The vectors add a dimension to the output array. The resulting dimensions are: (batch, sequence, embedding).
2.Next, a GlobalAveragePooling1D layer returns a fixed-length output vector for each example by averaging over the sequence dimension. This allows the model to handle input of variable length, in the simplest way possible.
3.This fixed-length output vector is piped through a fully-connected (Dense) layer with 16 hidden units.
4.The last layer is densely connected with a single output node. Using the sigmoid activation function, this value is a float between 0 and 1, representing a probability, or confidence level.
Code Example (3): Embedding in a RNN Network for Text Classification
• With one Bidirectional layer
tf.keras.layers.Embedding(encoder.vocab_size, 64), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(64, return_sequences=True)), tf.keras.layers.Bidirectional(tf.keras.layers.LSTM(32)), tf.keras.layers.Dense(64, activation='relu'), tf.keras.layers.Dropout(0.5), tf.keras.layers.Dense(1, activation='sigmoid')
• In Keras Documentation site: https://keras.io/examples/pretrained_word_embeddings/
Noriko Tomuro 28
1. Text Categorization using Neural Networks
Slide Number 3
Slide Number 5
Slide Number 7
Term Weighting (TFiDF)
Slide Number 9
3. Word Embedding
Code Example (1): The Embedding Layer in Keras
Code Example (2): Embedding in a FeedForward Network for Text Classification
Code Example (3): Embedding in a RNN Network for Text Classification
Code Example (4): Using Pre-trained Word Embeddings (GloVe)

Documents

CSC 578 Neural Networks and Deep Learning - DePaul University