225
CSE 291: Trends in Recommender Systems and Human Behavioral Modeling Week 10 project presentations

Week 10 project presentations

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Week 10 project presentations

CSE 291:Trends in Recommender Systems and Human Behavioral

Modeling

Week 10 project presentations

Page 2: Week 10 project presentations

Neural Rating Regression with Abstractive Tips Generation for Recommendation

Balasubramaniam Srinivasan, Nitin Kalra, Prem Nagarajan

Page 3: Week 10 project presentations

Problem StatementGiven a user and an item, simultaneously predict precise rating and generate tips.

Page 4: Week 10 project presentations

DatasetAmazon dataset

Rating regression categories: Electronics, Movies, Books (size ~ 1 GB each)

Multi-task learning: Pet Supplies, Arts & Crafts, Cell Phone accessories

603,668 367,982 8,887,781

192,403 63,001 1,684,779

123,960 50,052 1,697,533

Page 5: Week 10 project presentations

Architecture

Page 6: Week 10 project presentations

Baseline model● Deep Learning based framework named NRT (Neural Rating and Tips

generation)● Multi-layer perceptron models user and item latent factor into rating● Gated Recurrent Unit (GRU) translates user and item latent factor into tips● Uses beam search algorithm to generate tips from the trained model● Multi-task learning framework integrates both rating predictions and tips

generation given by the objective function

Page 7: Week 10 project presentations

Evaluation metrics● For rating prediction task:

○ Mean Absolute Error (MAE):

○ Root Mean Square Error (RMSE):

● For Tip generation task:○ ROUGE-N score:

Page 8: Week 10 project presentations

Extension 1● Effect of the following metadata on the ratings

a. Also viewedb. Also boughtc. Bought together

● Modelling the new features as graphs● Learning from the node2vec representation of the nodes

Page 9: Week 10 project presentations

Extension 2● Using the factoid answers dataset for improving the rating prediction and

tip generation● Contains questions and answers data from Amazon

Page 10: Week 10 project presentations

Results for rating predictionBooks Electronics Movies & TV

Baseline model (NRT)

NRT + Also viewed (128)

NRT + Also bought (128)

MAE RMSE MAE RMSE MAE RMSE

*(Sampled

Down)

*(Sampled

Down)

0.805 1.060 0.885 1.130

* * 0.794 1.039 0.921 1.126

* * 0.802 1.052 0.905 1.119

Page 11: Week 10 project presentations

Results for tip generationROUGE-1 ROUGE-2 ROUGE-L

ROUGE-1 ROUGE-2 ROUGE-L

F1 P R F1 P R F1 P R

27.98 22.11 45.67 2.03 1.68 3.63 22.84 21.66 45.67

28.31 22.32 46.09 2.13 1.56 4.27 23.09 21.83 45.88

NRT

NRT + Q/A

NRT

NRT + Q/A

Pet Supplies:

Arts and CraftsF1 P R F1 P R F1 P R

30.85 26.06 44.90 1.21 1.12 1.88 26.33 25.59 44.90

31.14 23.33 55.84 0.86 0.79 1.13 28.72 26.98 55.84

Page 12: Week 10 project presentations

Results for tip generation

NRT

NRT + Q/A

Cell PhoneAccessories: ROUGE-1 ROUGE-2 ROUGE-L

F1 P R F1 P R F1 P R

28.25 19.82 60.21 1.02 0.64 2.68 20.86 19.40 60.21

29.08 25.73 43.27 0.45 0.34 0.71 23.22 22.09 43.14

Page 13: Week 10 project presentations

Results for tip generation

MAE RMSE

0.712 0.822

0.706 0.784

NRT

NRT + Q/A

NRT

NRT + Q/A

Pet Supplies:

Arts and Crafts MAE RMSE

0.543 0.9310

0.543 1.087

Page 14: Week 10 project presentations

Results for tip generation

NRT

NRT + Q/A

Cell PhoneAccessories:

MAE RMSE

0.539 0.487

0.493 0.477

Page 15: Week 10 project presentations

Limitations● Large datasets ● Model is Compute Intensive● Extensions are compute intensive

Page 16: Week 10 project presentations

Work in Progress

● Analyze the importance of time or season on product ratings and reviews○ Capturing user and item state

● Books Dataset Sampling

Page 17: Week 10 project presentations

Thank you!

Page 18: Week 10 project presentations
Page 19: Week 10 project presentations

Extension to Neural Collaborative Filtering

Wen Liang, Zeng Fan

Page 20: Week 10 project presentations

Original Paper

Presents GMF (General Matrix Factorization) model and NeuMF model.

Page 21: Week 10 project presentations

MotivationsUse user and item attributes in the dataset

Tackle the sparsity issue

Page 22: Week 10 project presentations

DataSet● MovieLens

○ User-Movie Ratings○ User information: gender, age, occupation○ Movie information: genre (e.g. adventure, comedy, etc.)

● Pinterest○ User-item pairs○ Number of each user’s pins○ User’s category

Page 23: Week 10 project presentations

Evaluation and Metrics● Evaluation

○ Leave-one-out evaluation: for each user, leave 1 user-item interaction to the testset

● Metrics○ Hit Ratio@10○ Normalized Discounted Cumulative Gain (NDCG)@10

Page 24: Week 10 project presentations

Revisit NeuMF ModelNeuMF: Combines GMF and MLP together to better capture implicit user-item relationship.

Using only GMF model is efficient and does not cost much performance.

Page 25: Week 10 project presentations

Attributed-aware deep CF model● An extension for the NeuMF

model● Social network based● Add pooling layer above

embedding layer

Wang et al. (2017) Item Silk Road: Recommending Items from Information Domains to Social Users

Page 26: Week 10 project presentations

Proposed ModelUse a shared user embedding to solve cold-start problem

Use a weight to balance:

Element-wise product between pairs of user, item and attributes vectors.

Page 27: Week 10 project presentations

Results

Page 28: Week 10 project presentations

Hit Ratio@10

Page 29: Week 10 project presentations

Normalized Discounted Cumulative Gain (NDCG@10)

Page 30: Week 10 project presentations

Training Loss vs. Epochs

Page 31: Week 10 project presentations

Questions

Page 32: Week 10 project presentations

Final Project: A synthetic Approach for Recommendation

Yan ChengMoyuan Huang

Page 33: Week 10 project presentations

Overview

1. Objective: predict customer ratings for business

2. Metric: root mean square error

3. Dataset: subset of Yelp

4. Models

Page 34: Week 10 project presentations

Dataset

1. Yelp - dataset

2. Select 5000 data for simplicity

3. To avoid sparsity in recommendation matrix, we work with users have more than 30 reviews

Page 35: Week 10 project presentations

Dataset

1. Ratings: business_id, business_stars, user_id, and user_average_stars

2. Relations: user_id and friend_ids

3. Reviews: business_id, user_id, and rating and review_text

Page 36: Week 10 project presentations

Model Overview

1. Basic Modela. Mean estimationb. Matrix Factorization

2. MF with latent factor

3. Topic MFa. origin versionb. modified version

4. Social MFa. Friend relationb. Social popularityc. User similarity

Page 37: Week 10 project presentations

● Basic Modela. Mean estimation

rating = mean(ratings) + [mean(user) - mean(ratings)] + [mean(business) - mean(ratings)]

b. Matrix Factorization

sklearn.decomposition.NMF

Model Overview

Page 38: Week 10 project presentations

● MF with latent factor

Model Overview

Page 39: Week 10 project presentations

Model Overview

● Topic MF(incorporate reviews)a. Input: tf for each reviewb. LDAc. Output: vector of topic distribution

● Different implementa. origin version didn’t work outb. modified version

Page 40: Week 10 project presentations

Model Overview

● Topic MF(incorporating reviews)a. origin versionb. modified version

Page 41: Week 10 project presentations

Model Overview

● Social MF(social relationship information) WIPa. Friend relationb. Social popularityc. User similarity

Page 42: Week 10 project presentations

Result

1. Basic Modela. Mean estimation 0.804b. Matrix Factorization 0.800

2. MF with latent factor ?

3. Topic MFa. origin version 0.907b. modified version 0.794

4. Social MFa. Relation 0.796b. Popularity 0.773c. User similarity 0.804

Page 43: Week 10 project presentations

WIP

1. using word representation instead of bag-of-words

2. Combine social MF together

3. Compare performance across models

4. Explain the result

Page 44: Week 10 project presentations

Dynamic Recurrent Network for Next-Basket Recommendation with Attention

TEAM MEMBERS :

KRITI AGGARWAL, SUDHANSHU BAHETY, DIGVIJAY KARAMCHANDANI

Page 45: Week 10 project presentations

Original Paper: A Dynamic Recurrent Model for Next Basket Recommendation (DREAM)

▶ Original DREAM model proposes a dynamic recurrent basket model based on RNN for next basket recommendation

▶ Merges user basket’s current items and global sequential basket features using RNN - LSTM into users’ recurrent and dynamic representation.

▶ It shows that the nonlinear operation(MAX-POOL) on learning the representation of a basket does well in capturing elaborate interactions among multiple factors of items. (i.e. Learns Item embedding as a part of the network using a feed forward network)

▶ Extensive experiments on two public datasets (T-mall and Ta-Feng) demonstrated the effectiveness of the proposed model.

Page 46: Week 10 project presentations

Original Network Architecture

Page 47: Week 10 project presentations

Extension 1: Implementing and Adapting the DREAM Model to Instacart Dataset

▶ We used the Instacart Market Analysis dataset as the original datasets were not available to us.

▶ The reason this dataset was chosen was because this was found to be the closest to the original datasets while doing literature review.

▶ We needed to communicate with authors to clarify certain parts o of the paper.

▶ Implemented the original DREAM Model on pytorch.▶ Dataset Description:

▶ Anonymized and contains a sample of over 3 million grocery orders from more than 200,000 Instacart users.

▶ Between 4 and 100 of their orders, with the sequence of products purchased in each order

Page 48: Week 10 project presentations

Extension 2: Adding Attention to the DREAM MODEL

▶ We took the idea of adding attention from the ICLR 2015 paper “NEURAL MACHINE TRANSLATION BY JOINTLY LEARNING TO ALIGN AND TRANSLATE”.

At Time t,Yt is the representation of the input basket.st-1 is the hidden representation of the user (LSTM hidden state).(+) -> we add attention for the weighted focus on previous user hidden states.After attention we get a context vector(Ci), same size as st-1.Ci represents the weighted sum of previous user representation, so that the model attends to the most important user hidden factors.

Page 49: Week 10 project presentations

Adding Attention to the DREAM MODEL

▶ Drawing parallels from attention used successfully in Seq2Seq models, we wanted the LSTM to take input the most important parts of the last few user hidden representations.

▶ The hidden representations of the users are captured at each time step t.

▶ The attention is based on an alignment score, i.e, how correlated is the current input to each of the previous baskets in a window.

▶ We have an hyper-parameter k, which decides what would be the appropriate window size for previous baskets.

Page 50: Week 10 project presentations

RESULTS

▶ Due to computational limitations, we sampled out 10% of the Dataset

▶ We ran the DREAM model on 32000 users, 44440 unique Items.

▶ Padding was performed differently on each batch

▶ Runtime: 500s for each epoch.

▶ The model saturated after about 10 epochs.

▶ Final Results :

F1 @k NDCG Precision@k Recall @k

Baseline 0.0548 1.2688 0.2822 0.0367

Our Model 0.0493 1.2377 0.2767 0.0303

Page 51: Week 10 project presentations

Key Takeaways and Future Work▶ Attention did not help our case

▶ Learning item embedding as a part of the network and max pooling being a non linear operation does well in capturing elaborate interactions among multiple factors of items.

▶ Padding specific to each basket performs better than having the same pad length.

Page 52: Week 10 project presentations

Questions ?

Page 53: Week 10 project presentations

Extensions on Generating and Personalizing Bundle

Recommendations on SteamYiwen Gong, Siyu Jiang, Kuang-Hsuan Lee

Page 54: Week 10 project presentations

Objectives1. Predict the preference rating of items/bundles given

the user2. Recommend bundles to the given user according to

their preference3. Generate new personalized bundles

Page 55: Week 10 project presentations

Original paper: Architect

Item BPR model

Bundle BPR model

Bi , Pu , Qi

Items

Initial bundle

Candidate items

Candidate bundle

compare

User_item Data

User_Bundle Data

Recommended bundle

Page 56: Week 10 project presentations

Weakness of Original Paper● Naive model, more features could be considered given data

○ Game Genre○ Bundle Discount Rate

● Unstable results, model AUC varies from 0.63 to 0.88● Users share preference for bundle diversity

● Flaws in bundle generation○ Generate new bundles only consisting items from existing bundles○ Tends to include popular items in bundles (To increase profit,

common bundles usually carry a small amount of unpopular items)

Page 57: Week 10 project presentations

Original Bundle Ranking● 2-step BPR

○ Item BPR

○ Bundle BPR

Page 58: Week 10 project presentations

Extended Bundle Ranking● 2-step BPR

○ Item BPR

○ Bundle BPR

Page 59: Week 10 project presentations

Extended Bundle Ranking: discount effect

Bundle discount [0 , 1] tan(x), x : [ -π/ 2 , π/ 2]

sigmoid(x) : [ 0 , 1]Discount effect : [0 , Tu]

discount Increasing utility

0% - 10% high

40% - 50% few

90% - 100% high

Page 60: Week 10 project presentations

Original Bundle GenerationMethod Result

● Recommend bundles with items a user

has already bought

● For 100 users, only 7 of the items are

new to everyone

● AUC is not always right!

● People can buy the popular items

without recommendation system, one of

goal of system is to activate user’s

demand

Page 61: Week 10 project presentations

Extended Bundle Generation● Item BPR model has learnt info about items outside of all existing

bundles, our bundle generation is able to generate bundles with items new to existing bundles

● Ensure generated bundles consist only items a user never bought● Tends to include unpopular items for profit consideration

Page 62: Week 10 project presentations

New Bundle Generation Algorithm1. Assign picking probability to each item based on their popularity. Less popular

items get higher probability.

2. Initialize a bundle with a random size from [Average Bundle Size - s, Average Bundle Size + s]. Items are chosen according to their assigned probabilities Pu,i.

3. Generate a candidate set. Choose half of the items in the set from items not bought. Choose the other half from all items using Pu,i.

4. Generate new bundles by adding, deleting or replacing items in the initial bundle using items from the candidate set.

5. Choose bundle with the largest Xu,b as the new bundle. 6. Repeat steps 3 to 5 until converge.

Page 63: Week 10 project presentations

Results - Bundle Ranking

Page 64: Week 10 project presentations

Results - Bundle Generation

Page 65: Week 10 project presentations

t-SNE embedding of latent representations

Page 66: Week 10 project presentations

Conclusion ● We propose several extensions for the original bundle recommendation

method.● Our method achieves a large improvement in BPR ranking results over

the original method.● Our method achieves better and more reasonable bundle generation for

specific users.

Page 67: Week 10 project presentations

Limitation & Future Work● About 97% users bought less than 10 bundles. If a user only bought few

items or bundles, it would be hard to estimate the user sensitivity on bundles prices.

● The current model generates bundles based on the preference of users. However, without knowing the commercial information (cost, etc. ), It is hard to generate bundles which are beneficial for game distributors.

Page 68: Week 10 project presentations
Page 69: Week 10 project presentations

TransRec: Smarter Translation Vectors

Rajiv Pasricha

Page 70: Week 10 project presentations

Original Paper

Translation-based Recommendation, by Ruining He, Wang-Cheng Kang, and Julian McAuley

● Sequential model for recommendation○ Embed users and items into a low-dimensional

“translation space”○ Each user travels along their personalized

trajectory of item interactions○

Page 71: Week 10 project presentations

The TransRec Model

● Probability of next item j given user u and previous item i● βj = item bias (captures overall item popularity)● d = distance function (e.g. L1 or L2)● γi = previous item factors, γj = next item factors● Tu = user translation vector● Φ, Ψ= transition space and subspace, restricting factors helps regularization (TransRec: L2 ball)

● Trained using Sequential BPR Loss, SGD

Page 72: Week 10 project presentations

Datasets and Evaluation

Evaluation: AUC

Page 73: Week 10 project presentations

Extensions: Personalization

● Personalized translation vector○ Model “typical” sequences of items that are common across users

○ AUC on the Amazon Video Games dataset: 0.7610 → 0.7633

Page 74: Week 10 project presentations

Extensions: Temporal Dynamics

● Time Delta model○ Incorporate the time delay between interactions○ Interactions that are farther apart can have larger translations between them.○ Amazon Video Games dataset: 0.7610 → 0.7544

● Personalized Time Delta model○ Add a user-specific scaling factor to the above time deltas○ Learn the scaling factor from the data○ Amazon Video Games dataset: 0.7610 → 0.7570

Page 75: Week 10 project presentations

Extensions: Extra Translation Vector

● Introduce separate user offsets for short-term and long-term interactions○ Learn two translation vectors per user, threshold at time delay = 6 months○ Allow users to exhibit different tendencies based on temporal data○ If 6 months, then ○ If 6 months, then

○ Amazon Video Games dataset: 0.7610 → 0.7646

Page 76: Week 10 project presentations

Extensions: Nonlinear Translation Vectors

● Use a neural network to model more complex translation relationships● (1) Model a nonlinear relationship between the previous item and user

translation vector.○ ○ Amazon Video Games dataset: 0.7610 → 0.7661

● (2) Directly estimate the probability of transitioning to the next item.○ ○ Amazon Video Games dataset: 0.7610 → 0.7552

Page 77: Week 10 project presentations

Extensions: Nonlinear Temporal Models

● Add temporal information into the nonlinear neural network models● Add the delta between the previous and next interaction times

○ Neural Net translation vector model: ○ Amazon Video Games dataset: 0.7610 → 0.7661 → 0.7665○ Neural Net distance model:○ Amazon Video Games dataset: 0.7610 → 0.7552 → 0.7629

● Add the raw previous and next interaction times○ Neural Net translation vector model:○ Amazon Video Games dataset: 0.7610 → 0.7661 → 0.7662○ Neural Net distance model:○ Amazon Video Games dataset: 0.7610 → 0.7552 → 0.7661

Page 78: Week 10 project presentations

Visualization

● “Transition space” learned by the model (when k = 2)

Training sequence of items for one user in the dataset

Page 79: Week 10 project presentations

Visualization (without normalization)

● “Transition space” learned by the model (when k = 2)

Training sequence of items for one user in the dataset

Page 80: Week 10 project presentations

Discussion and Future Work

● Adding nonlinear translation vectors helps the model learn more complex relationships between items.

● Adding temporal information helps when integrated with nonlinear models.

● It will be helpful to also compare results using different evaluation metrics in addition to AUC, e.g. Hit@50

● Additional visualizations, come up with a model that more clearly arranges items sequentially in the transition space.

Page 81: Week 10 project presentations

Questions?

Page 82: Week 10 project presentations

Extensions to Personalized Ranking Metric Embedding

(PRME)

- Shreyas Udupa Balekudru

Page 83: Week 10 project presentations

Problem Statement

Next New POI Recommendation problemNew POIs with respect to user’s current location are to be recommended

Input – User ID, Current POI, Physical Location(Latitude and Longitude), Check-In TimeOutput – Recommended POI

Page 84: Week 10 project presentations

Dataset

Foursquare check-insCheck-ins in Singapore between 08/2010 and 07/2011Number of check-ins = 151589Number of users = 2321Number of POIs = 5596

Page 85: Week 10 project presentations

Training for PRME

Sequential transition space and User preference space (weights of metric spaces parameterized)Stochastic Gradient DescentModel parameters initialized with normal distributionIf check-in time difference is greater than threshold, only parameters in user preference space are updated.

Page 86: Week 10 project presentations

Hyperparameters Used

K = 20Number of iterations = 1000Alpha = 0.2Learning rate = 0.005Regularization factor = 0.03

Page 87: Week 10 project presentations

Incorporating Distance (PRMEG)

Include geographical distance as a multiplicative factor in the distance metric.Users prefer visiting nearer POIs over farther POIs.

Page 88: Week 10 project presentations

Issues Faced

Units for distance not specified in the original paperTraining time for higher values of k is prohibitive (Training Algorithm Complexity - O(IK|C|))Dates in data are specified with ID. It is unclear if consecutive IDs represent consecutive days.

Page 89: Week 10 project presentations

Evaluation MetricMean Reciprocal Rank

Where Q is the number of queriesAnd Rank_i is the ranking of the next POI in the test set in comparison to 20 randomly sampled ‘negative’ POIs from the dataset

Page 90: Week 10 project presentations

Results

Higher the value of k, better the results.Not all functions of the geographical distance lead to improvement in performance.

Page 91: Week 10 project presentations

Visualization

Page 92: Week 10 project presentations

Work in Progress

Re-evaluating results of PRMEG from the paper for sanity check.Interpretation of metric embedding visualization.Evaluating PRMEG-like approach for new product recommendation using rating as a distance metric.

Page 93: Week 10 project presentations

Questions?

Page 94: Week 10 project presentations

Wednesday Presentations

Page 95: Week 10 project presentations

Personalized Next Song Recommendation

Kiran Kannar, Rahul DubeyDec 06, 2017

Page 96: Week 10 project presentations

Problem StatementGiven user song listening history, provide personalized next song recommendation using metric embeddings.

s1

Viva La Vida

Coldplay

Just The Way You Are

Bruno Mars

s2

?

?

s4

Firework

Katy Perry

s3

Page 97: Week 10 project presentations

Datasets

Measure Now Playing 30 Music

# sessions 9288 100,000

#users 1032 7146

#tracks 76,652 694817

Avg. #sessions/user 9 9.33

Page 98: Week 10 project presentations

PRME Model

Transition Probability

MAP

Gradient update equation

Page 99: Week 10 project presentations

PRME- AuPersonalizing alpha

- Non-convex problem- Use alternating minimization technique- Empirical results showed random normal clipping better than sigmoid/tanh, 0/1 clipping, - Best initialization: initialize to global alpha of PRME! - Bounding tradeoff better than unbounded tradeoff

Page 100: Week 10 project presentations

PRME SocialSimilarity Score (asymmetric)

MAP

Gradient update equation

Page 101: Week 10 project presentations

Results

Page 102: Week 10 project presentations

AUC vs iterations

Now Playing 30 Music

Page 103: Week 10 project presentations

Metrics vs. Dimensions

MRR Hit Rate

Page 104: Week 10 project presentations

Visualizing songs in sequence space

Page 105: Week 10 project presentations
Page 106: Week 10 project presentations
Page 107: Week 10 project presentations
Page 108: Week 10 project presentations
Page 109: Week 10 project presentations

Alpha_U statistics - I: (30M dataset)

Median: 0.1964205

Mean: 0.216677464557

Standard deviation: 0.146711865691

Minimum value: 9.9e-05

Maximum value: 0.931338

Page 110: Week 10 project presentations

Alpha_U statistics - II: (30M dataset)

Page 111: Week 10 project presentations

Alpha_U statistics - III: (30M dataset)

Page 112: Week 10 project presentations

Thank you!

Page 113: Week 10 project presentations

FashionGAN: A generative model for fashion recommendation

By Vignesh Gokul

Page 114: Week 10 project presentations

Base paper● Learning Visual Clothing Style with Heterogeneous Dyadic Co-occurrences

(Andreas Veit, Balazs Kovacs, Sean Bell ,Julian McAuley ,Kavita Bala and Serge Belongie)

● The paper implements a Siamese CNN with strategic sampling to learn the embedding space for all items and use these embeddings to build a better item recommender system

Page 115: Week 10 project presentations

Siamese CNN Architecture

Page 116: Week 10 project presentations

FashionGAN● A generative model, which outputs a compatible image given an input image● Condition on the input image● Related Work:

○ Image-to-Image Translation with Conditional Adversarial Networks

Page 117: Week 10 project presentations

Image to Image Translation with CGANs

Page 118: Week 10 project presentations

FashionGAN

Page 119: Week 10 project presentations

Siamese GAN

Figure: Architecture of the Generator

Page 120: Week 10 project presentations

Siamese GAN(Results)

Page 121: Week 10 project presentations

Evaluation

● Inception score● Opposite SSIM

● Can improve siamese GAN by using variational encoders.

Model Inception score Opposite SSIM

Image to Image GAN 3.2733448 0.64017811

Siamese GAN 1.9960622 0.6160975

Page 122: Week 10 project presentations

Another Extension (Work in progress)● Use deep supervision to improve siamese CNNs

Page 123: Week 10 project presentations

Questions?

Page 124: Week 10 project presentations

TransNets: Learning to transform for Recommendation

By:Akanksha Grover Dhruv Sharma Rishab Gulati

Page 125: Week 10 project presentations

TransNets: Using Review Text for rating prediction

● TransNets represents Users, Items using past reviews given by/to them

● Learns a latent representation of the prospective review using the interactions between

<User,Item>

● Optimizes the MSE of the ratings produced

● Models only the interaction between a user and item using only the reviews

● TransNet Ext. models the interaction using both the user-item latent vector and review as input

Page 126: Week 10 project presentations

Dataset and Code● Original Model uses Yelp 2016 dataset: https://www.yelp.com/dataset_challenge

● We ran the model on Yelp 2017 dataset.

● Data Statistics:

○ 4,700,000 reviews

○ 156,000 businesses

○ 1,100,000 users

● This data is larger than the original data the model was run on when the paper was published.

● Original code taken from: https://github.com/rosecatherinek/TransNets

● Modifications have been done to the above code for extensions

Page 127: Week 10 project presentations

Train, Test and Validation Epochs

● We divided the entire dataset including all reviews into 3 parts - train, test and validation

sets randomly.

● Number of datapoints:

○ Train-3,789,517

○ Test-473,689

○ Validation-473,689

● Due to computational reasons, we limited our experiments to a single training epoch.

● We ran the original model and our modified model for one epoch and compared the MSE

value.

Page 128: Week 10 project presentations

Original Model● A Target Network that processes the

target review rev AB

● A Source Network that processes the texts of

the (userA, itemB ) pair that does not

include the joint review, rev AB .

● Original model concatenates all user

reviews and item reviews respectively for

the user and item till a length of 1000 words

(except the common review).

Page 129: Week 10 project presentations

TransNet Original

1. Running Time : 9 hours per epoch

2. Review Length = 800

3. Embedding Trained on top 50K frequent

words in Yelp 2017

4. Test MSE : 1.81

Page 130: Week 10 project presentations

Extension 1:

Issues:● Original model concatenates all reviews

into single composite review● It does not consider variation in review

text across reviews of different ratings.● Requires a large matrix of word

embeddings for each user/item/review

Proposed Solution:● Each column should represent a review’s

embedding● Reviews are sorted by ratings to allow the

CNN to learn variation by spatial correlation in the matrix

● Requires only a matrix of K x Size of embedding

Page 131: Week 10 project presentations

Extension 1:

Each column is a the latent representation of a user review/item review.

Review Embeddings

A set of k user/item reviews sorted by rating in increasing order

Page 132: Week 10 project presentations

ExperimentsWe sampled the reviews for each user/item using the following methods:

● For all the methods we fixed a threshold(K) value i.e the number of users/item reviews to be

sampled. We took values of k=10 and k=20.

● Review embeddings were made by summation of word embeddings of all the words in a

review in the first three experiments.

● Review embeddings were learnt from a separate DeepCon Network in the last experiment

● We did a total of 4 experiments for this extension

● One training epoch takes about 3-4 hours for each experiment.

Page 133: Week 10 project presentations

Experiments1. Sample K Reviews using User/Item Reviews + Global Sampling

○ We randomly sampled ‘k’ user/item reviews and sorted them in increasing order

according to review rating .

○ If the user or item reviews were less than ‘k’, we sampled the reviews from a global set

of all reviews.

MSE : 1.923229

Possible Issue: Sampling from a global set of reviews might not be relevant for the <user,item>

Page 134: Week 10 project presentations

Experiments2. User/Item Reviews + Corresponding Item/User Reviews + Global Sampling

○ We randomly sampled ‘k’ user/item reviews and sorted them in increasing order

according to review rating .

○ If the user or item reviews were less than ‘k’, then we sampled the reviews for the

corresponding item/user in the training data. If the reviews were still less than ‘k’, we

sampled the rest from a global set of all reviews

MSE : 1.858533

Possible Issue : Most of the users, items have very few reviews, hence most training samples are

for cold start users / items

Page 135: Week 10 project presentations

Experiments 3. Filtered Training Data+User/Item Reviews + Corresponding Item/User Reviews + Global

Sampling

○ Sampling process was kept same as (2) but we filtered the training set to keep only those

users/items which had at least ‘k’ reviews.

MSE : 1.886834

Possible Issue: Training on data with no cold start <user, item> did not generalize well for the test set

Page 136: Week 10 project presentations

Experiments4. Generate Review Embeddings using DeepCon

○ Sampling process was kept same as (2).

○ Review embeddings were generated by a separate DeepCon Network

trained on a small sample of training set with equal representation of

each rating.

○ Acts as a step of generate pretrained “Review Embeddings”

○ Tried the review embeddings with only Experiment 2

MSE : 1.730112

Performs the best among all experiments and baseline

Page 137: Week 10 project presentations

Results for Extension 1

Baseline Experiment 1(k=10)

Experiment 2(k=10)

Experiment 2(k=20)

Experiment 3(k=10)

Experiment 4(k=10)

1.813865 1.923229 1.858533 1.891809 1.886834 1.730112

MSE

Baseline Extension(k=10) Extension (k=20)Number of input parameters 153,600 52,480 53,760Hours to train per epoch ~9.5 hrs ~4 hrs ~5 hrs

Page 138: Week 10 project presentations

Extenison 2● We show that TransNet can be used for Tips generation on Yelp dataset

● Inspired from the paper “Neural Rating Regression with Abstractive Tips Generation

for Recommendation”, Piji Li et. al

● Review latent representation learned from TransNets’ transform layer is used as a context

vector to generate tips

Page 139: Week 10 project presentations

Methodology(1/3)

● Original Yelp dataset has about 400,000

data points that has both reviews and tips.

We take top 50,000 training data points.

● Train the entire TransNet for just 1 epoch.

● Transfer the output of Transform Layer for

each data point in the train set as well as

the test set to the RNN.

Page 140: Week 10 project presentations

Methodology (2/3)● Sequence length = 3

● GloVe embeddings of the most common 50,000 words in reviews .

● Added a <UNK> word to represent all words not in the vocabulary

● Add embeddings of 2 <UNK> vectors and embedding of the first word of the tip -> 50 dim vector

● Concatenated with 64 dimension vector from the TransNet that represents corresponding review

● Concatenated vector is fed as the input Context Context Context Context

Page 141: Week 10 project presentations

Methodology (3/3)

● We train for 500 epochs for about 6 hrs.

● For test data, we used 2000 data points from the original 121k data points

● At test time, we concatenate the 64 dimension review vector from TransNet with 50

dimensional representation of <UNK>.

● This generates the first word of the tip and then we use embedding of each generated

word at each time step in concatenation with vector from TransNet.

● We sample word based on the output probability.

Page 142: Week 10 project presentations

Generated Tips

made place to beer . . . this amazing i i

they great place my to sushi onion spot in dog .

tea pizza . . week ' chicken items place very service

sale , great . this worth to cardio were this don

lots spot hair some is they be but no amazing tim

Page 143: Week 10 project presentations

Baseline and Evaluation● We use LexRank as our baseline.● LexRank produces summary of the whole review.● We calculate ROUGE-1 and ROUGE-2 as our evaluation measures.●

Score LexRank TransNet + RNN

Precision Recall F-1 Score Precision Recall F-1 Score

ROUGE-1 0.0694001 0.0451601 0.0456172 0.0242294 0.0292329 0.0239379

ROUGE-2 0.0025476 0.0013569 0.0014726 0.0003033 0.0003055 0.0002377

Page 144: Week 10 project presentations

Conclusion & Future Work

1. Pre-trained review embeddings gave the highest boost

2. We are still not very sure about the best way to sample the K reviews of the user/item

and we want to investigate further how review embeddings change our results

3. There is no analysis on robustness to temporal change

4. The absolute value of MSE is very high

5. Future Work:

a. Combine temporal signals and use global, user and item biases

b. Extend the Transnet model to use implicit feedback and ranking prediction

c. Evaluate Tip Generation against more baselines

Page 145: Week 10 project presentations

Questions

Page 146: Week 10 project presentations
Page 147: Week 10 project presentations
Page 148: Week 10 project presentations
Page 149: Week 10 project presentations
Page 150: Week 10 project presentations
Page 151: Week 10 project presentations
Page 152: Week 10 project presentations
Page 153: Week 10 project presentations
Page 154: Week 10 project presentations
Page 155: Week 10 project presentations
Page 156: Week 10 project presentations
Page 157: Week 10 project presentations
Page 158: Week 10 project presentations
Page 159: Week 10 project presentations
Page 160: Week 10 project presentations
Page 161: Week 10 project presentations
Page 162: Week 10 project presentations

Neural Collaborative Filtering[He et al. 2017] Extensions

Kulshreshth Dhiman, Sai Kolasani

Page 163: Week 10 project presentations

Overview

● Extensions to Neural Collaborating Filtering [He et. al 2017]● Extensions

1. Pairwise Ranking2. Cold start3. Experiments with Architecture

● Dataset○ Movielens20M user-item interactions (converted to implicit feedback)○ Movie features from IMDB

Page 164: Week 10 project presentations

ModelGMF:

● Uses the inner product of the user and item representations in the latent space.

MLP:● A multi layer perceptron network using the

concatenation of user and item latent representations as an input feature.

NeuMF:● Combines the GMF and MLP into a single deep

network for better accuracy.

Page 165: Week 10 project presentations

Pairwise Ranking ModelGMF pairwise model

Shared weights

Shared weights

● The pairwise networks use shared weights and shared user embedding.

● The objective function is modified to maximize the difference between the score of a preferred item over a non preferred item.

● During evaluation we tried two approaches○ Take the output from the final sigmoid

layer(calculating the sigmoid of the difference of scores)

○ Take the sigmoid of the output from the linear dense layer just before the sigmoid - This works better.

● Similarly designed for other models

Page 166: Week 10 project presentations

Pairwise Ranking algorithm

● We can find the position of the positive item in the ranking efficiently using batch prediction.○ We find the pairwise ranking of the positive item with all the negative

items(N) in a single batch and find the number of times the positive item is preferred(k)

○ Then the ranking of the positive item will be N-k● If exact ranking of K items is needed then a heap based

algorithm can be used.

Page 167: Week 10 project presentations

Cold Start ModelGMF Cold start model

Page 168: Week 10 project presentations

Cold Start ModelMLP Cold start model

Page 169: Week 10 project presentations

Cold Start ModelNeuMF Cold start model

Page 170: Week 10 project presentations

Dataset● Movielens20M* dataset with user movie ratings from 2012 to 2015,

converted to implicit feedback data.● Sampling

○ Movies released after 1990○ Al teast 20 items per user○ Randomly sampled 7000 users

*https://grouplens.org/datasets/movielens/20m/

Data Statistics

#users 7000

#items 8491

#ratings 724K

Sparsity 98.78%

Items per user Min:20, Max:1980, Median:59

Users per item Min:1, Max:3609, Median:12

Page 171: Week 10 project presentations

Item Features

● Collected item features using http://www.theimdbapi.org/ API

● From the data extracted from IMDB we used the following features.○ Year of release: Binned years (5 years) and then one hot encoding ○ Genre: Many hot encoding over 24 different genres○ Text features: We used the gensim doc2vec library to learn vector

representation of storyline of the movies in the training set

Page 172: Week 10 project presentations

● Randomly sampled 10% items as cold start items and rest in training set

● Test SetLatest 2 positive user-item pairs in test set

● Test set - Cold-start (completely new items)no user-item pairs in the train set

● Test set - Pseudo Cold-start (relatively new items)10 positive user-items pairs in train set, rest in test set

● Train Set#negatives per positive user-item = 4

Train-Test set

Page 173: Week 10 project presentations

Evalulation

● For evaluating the performance of the model we use ○ HR: HitRate@10○ NDCG: Normal Discounted Cumulative Gain(NDCG@10)

log(2/(rank+2))○ AUC

● Randomly sampled 99 negative user-item pairs (not in train set) and ranked positive item among negative items

Page 174: Week 10 project presentations

Results - Base

Page 175: Week 10 project presentations

Results - Base

Page 176: Week 10 project presentations

Results - Pairwise

Point wise performed better than pair wise

Page 177: Week 10 project presentations

Results - Coldstart● Cold start models outperformed

base models● Item features improved

performance over general test set

Page 178: Week 10 project presentations

Results - Coldstart● Cold start models

performed better thanBase models

● MLP had higher hit rates

Page 179: Week 10 project presentations

Results - Coldstart● MLP models had

relatively high hit rates

Page 180: Week 10 project presentations

Results - Coldstart● NeuMF cold start model

had highest AUC

Page 181: Week 10 project presentations

Architecture Experiments● NeuMF: Shared Embeddings for

GMF and MLP model○ GMF and MLP learn different latent space

● GMF: Add a dense layer after MF layer(dim=latent_size/2)

HR NDCG AUCSeparate 0.8039 0.5021 0.9288

Shared 0.7962 0.4975 0.9261

HR@10

PF GMF With Dense Base GMF

8 0.7981 0.7986

16 0.7999 0.8046

32 0.7999 0.8046

Page 182: Week 10 project presentations

Conclusion

● Pointwise ranking model works better than pairwise ranking model

● ItemFeatures like storyline, genre and year improves hit-rates of cold start performance as well as non-cold start items

● GMF had higher hit rates for non-cold start items and MLP had higher hit-rates for cold-start items

Page 183: Week 10 project presentations

Questions?Thank you!

Page 184: Week 10 project presentations

Jointly Modeling Aspects, Ratings and Sentimentsfor Movie Recommendation (JMARS)

Presented By: Rishabh Misra, Tushar Bansal

Page 185: Week 10 project presentations

Problem Statement

● Motivation: Uncovering aspects and sentiments from reviews could provide a better understanding of users, movies (items), and the process involved in generating ratings.

● Approach: Capture the interest distribution of users and the content distribution for movies and provide a link between interest and relevance on a per-aspect basis. Authors also differentiate between positive and negative sentiments on a per-aspect basis. This all leads to better rating prediction.

Page 186: Week 10 project presentations

Model

Page 187: Week 10 project presentations

Algorithm

● Objective:

● EM Algorithm● E-Step : Sample {y, z, s} for each word from the current distribution● M-Step :

○ Fix sampled {y, z, s} for each word○ Optimize other parameters using L-BFGS.

Page 188: Week 10 project presentations

DataOriginal Paper: IMDB dataset

● 54671 Users | 22380 Movies | 348415 Reviews

Our Implementation:

● Amazon Clothing Category Dataset○ 1981 Users | 1962 Items | 11935 Reviews

● Amazon Instant Video Dataset○ 2000 Users | 1643 Items | 14355 Reviews

● We opted for small datasets because the inference of JMARS on large number of reviews is computationally expensive and time intensive, and we spent most of our time implementing the original method.

Page 189: Week 10 project presentations

Extension● Add temporal dynamics to user latent factors, biases and interest

distribution. ● Idea borrowed from Collaborative Filtering with Temporal Dynamics

(Koren, 2009)● This formulation doesn’t lead to significant increase in parameters.

Page 190: Week 10 project presentations

Quantitative Results

Amazon Clothing Data

Without Temporal Dynamics

With Temporal Dynamics

Improvement

Baseline 1.1505 1.1420 0.74%

JMARS (A=6; K=5) 1.1251 1.1152 0.88%

JMARS (A=12; K=5) 1.1244 1.1150 0.84%

Baseline: JMARS without language models (i.e. simple latent factor model).

Evaluation Metric: MSE

Amazon Video Data

Without Temporal Dynamics

With Temporal Dynamics

Improvement

Baseline 1.1269 1.1170 0.88%

JMARS (A=6; K=5) 1.0945 1.0843 0.93%

Page 191: Week 10 project presentations

Qualitative Results

● Background Words○ Price, Product, Picture, Fit, Wear, Quality, Purchase, Material

● General Sentiment Words○ Positive

■ Comfort, Nice, Well, Love, Buy, Good, Great, Pretty○ Negative

■ Problem, Waste, Flaw, Review, Nothing, Worst

● Aspects Words○ Material/Color

■ Color, Material, Elastic, Light, Care, Weather○ Size/Fit

■ Tight, Wear, Comfort, 8/10 (shoe sizes), Inch

Page 192: Week 10 project presentations

Qualitative Results

● Aspects Sentiment Words○ Material/Color

■ Great, Design, Soft, Quality, Durable, Cheap○ Size/Fit

■ Shrink, True, Doesn’t/Don’t, Small, Thick

● Item Specific Words○ Item 1

■ Bag, Compartment, Pocket, Purse○ Item 2

■ Shoe, Clarks, Merrell, Timberland

Page 193: Week 10 project presentations

Temporal EffectInterest distribution change for aspect material/color.

Date: 06/11/2013

My hubby is hard on his shoes, so I like to find him good ones at a reduced price, such as these. He likes the fit and feel of New Balance, so these will be his next pair when his current ones are too tattered to wear anymore. Good grippy sole for our rocky western trails, and decent laces that shouldn’t break with his hard use.

Date: 04/02/2014

Thanks to another reviewer I got the green ones instead of the raspberry. The green insoles have just the right arch support for my plantar fasciitis-ridden feet. I am glad to have them in my everyday Merrell slip on shoes. These insoles are not too soft, but soft enough, and after just one day of wear I don't notice them at all, which is perfect. Based on the last pair (I had the raspberry) I expect about a year from these, but will happily accept a longer wear time from them.

Page 194: Week 10 project presentations

Conclusion and Future Work

● The extension did improve on the current model but only by a small amount.

● The reasons for only a small improvement could be:○ The dataset we use is relatively small (because of limited resources) with few

reviews for each user so the temporal dynamics might not learned properly. ○ The linear time function might not be the best to capture the temporal dynamics

across different aspects. Other options like binning might work better.

● Add hierarchical structuring to the language models.

Page 195: Week 10 project presentations

Questions?

Page 196: Week 10 project presentations

TransNets++Learning to Translate Better by Accounting for

Higher Order Interactions

Sejal ShahSiddharth Dinesh

Page 197: Week 10 project presentations

Goal

What effect does the inclusion of higher order interactions have on a complex feature extraction mechanism such as TransNets?

MotivationNeural networks are predominantly used for preprocessing of data in recommender systems

Neural factorization machines have not been evaluated in settings where the features are neurally extracted

Page 198: Week 10 project presentations

TransNets

Page 199: Week 10 project presentations

Factorization Machines

Neural Factorization Machine Plain Old Factorization Machine

Page 200: Week 10 project presentations

Implementation of the paper1. Data: Yelp Dataset 2017

a. 4.7 million reviews

b. TransNets paper uses only 4.1 million reviews: Filtering criteria is unclear

2. Resultsa. Our implementation resulted in an MSE of 1.7559 (random epochs, filtered reviews)

b. Used the result from the TransNet implementation as our baseline

Page 201: Week 10 project presentations

Extension 1: L2 Loss

● TransNets optimizes the Factorization Machine using L1 loss.● We report MSE, so makes sense to optimize L2 loss directly

Page 202: Week 10 project presentations

Extension 2: Batch Normalization ● Batch Normalization is new-age alchemy to induce faster convergence of

SGD

Page 203: Week 10 project presentations

Extension 3: Neural Factorization Machines● Added Neural layers to factorization machine● Experimented with 0, 1, 2 hidden layers.

Page 204: Week 10 project presentations

Conclusions● Number of training epochs is important when comparing results● Creation of training epoch batch results in variance in MSE of TransNet

predictions, as TransNet only considers 1000 words from the reviews● Neural Factorization Machine only slightly improves predictions when input is

already constructed using non-linear transformations● Would NFM improve rating prediction if one-hot embeddings of users and

items are also served as input to the Factorization Machine?○ Like in TransNet-Ext

● How much do these results depend on the dataset? ○ Confirm lack of improvement from NFM using another dataset:

■ Google Local■ Amazon Reviews

Page 205: Week 10 project presentations

Questions?

Page 206: Week 10 project presentations

Efficient Bayesian Methods for Graph-based Recommendation Systems

Aditi Mavalankar, Stephanie Chen and Ajitesh Gupta

Page 207: Week 10 project presentations

Original Model Overview

● Authors proposed a fast graph based method for general purpose recommendation

● It scores all items available on a 3-step path from the user in order to provide new recommendations.

● Scoring is done by making use of probability distributions based on the item ratings

Target User Item 1

User X Item 2 PotentialRecommendation!!

1

2

3

Page 208: Week 10 project presentations

Original Model Overview - Reliability of Item

● Binary random variable Yj = 0 for negative assessment, 1 for positive assessment ● P(Yj = 1) = θj ~ Reliability of the item. Modelled as a Beta distribution.

∴ P(θj = 1 | Ratings) ~ Beta( R+ , R- ) (Conjugate Distributions)

● R+ = No. of Positive ratings● R- = No. of Negative ratings

Page 209: Week 10 project presentations

Original Model Overview - Scoring Functions

● Posterior Inequality Scoring (PIS) - Probability of the reliability of candidate item x being greater than the reliability of item v in the user history.

● Posterior Prediction Scoring (PPS) - Probability of both v and x receiving positive assessments where we assume that Yv and Yx are independent.

● Posterior Odds Ratio Scoring (PORS) - How large the odds of x receiving a positive assessment is when compared to the odds of v receiving a positive assessment

Target User Item V

User W Item X PotentialRecommendation!!

Page 210: Week 10 project presentations

Positives of the original model

● Existing approaches often use random walks○ Large number of transition matrices to be stored○ Large matrix multiplication operations○ Large number of simulations to converge in some cases

● No matrix multiplications ⇒ 1-2 orders of magnitude faster● No large matrices to store ⇒ Much lower space complexity

Page 211: Week 10 project presentations

Negatives of original model - Motivation for extensions

● It does not involve the user information in the process of recommendation○ Binary Interactions - How similar users are to each other ?○ Unary Information - How experienced is the user ? How many items has he rated before ?

● It does not involve the binary interactions between items as well○ How similar are two items ?

Page 212: Week 10 project presentations

Extension 1 - User reliability score

● Users that give ratings to more items are more significant. ● We generate a reliability score for each user, and multiply each item’s

PIS/PPS/PORS score by it to determine whether an item ought to be recommended.

Target User Item V

User W Item XModified_score( Ix ) = Rel( Uw ) * Score( Ix )

Page 213: Week 10 project presentations

Extension 2 - User similarity score

U1 I2

I1

U2

I3

I4

1 1

1 1

similarity =

1

0

1

1similarity_score

= similarity / total common items

=

012

2 / 4

= 0.5

Page 214: Week 10 project presentations

Extension 2 - User similarity score

Page 215: Week 10 project presentations

Extension 3 - Item similarity score

Page 216: Week 10 project presentations

User similarity and item similarity

User Similarity Heatmap Item Similarity Heatmap

Page 217: Week 10 project presentations

Mean Average Precision

Page 218: Week 10 project presentations

Mean Reciprocal Rank

Page 219: Week 10 project presentations

Precision@5

Page 220: Week 10 project presentations

Precision@10

Page 221: Week 10 project presentations

Normalized Discounted Cumulative Gain@5

Page 222: Week 10 project presentations

Normalized Discounted Cumulative Gain@10

Page 223: Week 10 project presentations

Results on ML-100k

Method MAP MRR P@5 P@10 NDCG@5 NDCG@10

PIS 0.1459 0.4173 0.2049 0.1654 0.2482 0.2310

PIS_USS 0.1472 0.4209 0.2049 0.1667 0.2486 0.2319

PIS_ISS 0.1476 0.4266 0.2023 0.1653 0.2464 0.2307

PIS_USS_ISS 0.1479 0.4264 0.2023 0.1657 0.2465 0.2309

PPS 0.1531 0.4213 0.2102 0.1724 0.2546 0.2410

PPS_USS 0.1546 0.4235 0.2106 0.1743 0.2554 0.2432

PPS_ISS 0.1546 0.4304 0.2095 0.1727 0.2544 0.2412

PPS_USS_ISS 0.1558 0.4317 0.2076 0.1735 0.2534 0.2415

PORS 0.1147 0.2949 0.1525 0.1330 0.1694 0.1643

PORS_USS 0.1149 0.2931 0.1529 0.1326 0.1693 0.1639

PORS_ISS 0.1188 0.3054 0.1540 0.1372 0.1737 0.1704

PORS_USS_ISS 0.1195 0.3038 0.1559 0.1384 0.1751 0.1716

Page 224: Week 10 project presentations

Conclusion

● User-user similarity is observed to be more useful than item-item similarity.● Introducing either kind of similarity improves the quality of recommendations.● User reliability score proves to be too naive, and hence provides no

improvement.● PPS still remains the top performer among the scoring techniques.● Since the results are consistent on FilmTrust, as well as ML-100k, it can be safe

to say that similar results will be exhibited on the other 5 datasets used in the original paper.

● Future work: Different algorithms to calculate user and item similarities

Page 225: Week 10 project presentations

THANK YOU!