View
395
Download
1
Category
Tags:
Preview:
DESCRIPTION
Breif about mechanism behind recommender systems.how you tube ,amazon uses this techonology.
Citation preview
Tag Based Social Recommender System(RS)
Project MentorMs Pragya Dwivedi
ByAditi GuptaAnirudh kanjaniAbhinav Vasu RawatKapil kumarAshutosh Singh
AgendaRecommender systems- overviewUsefulness of Recommender Systems(RS) Types of RSRelation with information architectureLimitations and possible improvementsRelation with Social Networking
What are they and Why are they
Recommender systems provide a way for information filtering that attempts to present information that are likely of interest to the user. Its advantages are:Enhances user experience
◦ Assists users in finding information◦ Reduces search and navigation time
Increases productivity Increases credibilityMutually beneficial proposition
Types of Recommender Systems(RS)
Content based RS
Highlights• Recommend items similar to those users
preferred in the past• User profiling is the key• Items/content usually denoted by
keywords• Matching “user preferences” with “item
characteristics” works for textual information
• Vector Space Model widely used
Content based RSLimitations
• Not all content is well represented by keywords, e.g. images
• Items represented by same set of features are indistinguishable
• Overspecialization: unrated items not shown
• Users with thousands of purchases is a problem
• New user: No history available• Shouldn’t show items that are too
different, or too similar
Collaborative RS Highlights• Use other users’ recommendations
(ratings) to judge item’s utility• Key is to find users/user groups whose
interests match with the current user• Vector Space model widely used (directions
of vectors are user specified ratings)• More users, more ratings: better results• Can account for items dissimilar to the
ones seen in the past too...ovielens.org
Collaborative RS Limitations• Different users might use different scales.
Possible solution: weighted ratings, i.e. deviations from average rating .
• Finding similar users/user groups isn’t very easy.
• New user: No preferences available.• New item: No ratings available.
Hybrid RSUses both content based and collaborative
filtering. Introduced to avoid the limitations found in
both content and collaborative methods.Example: Netflix- makes recommendations
by comparing the watching and searching habits of similar users (i.e. collaborative filtering) as well as by offering movies that share characteristics with films that a user has rated highly (content-based filtering).
Other Variations of RSCluster Models• Create clusters or groups.• Put a customer into a category.• Classification simplifies the task of user
matching.• More scalability and performance.• Lesser accuracy than normal collaborative
filtering method.
Possible Improvement in RS
Better understanding of users and items– Social network (social RS)1. User level• Highlighting interests, hobbies, and
keywords people have in common2. Item level• link the keywords to ecommerce (by RS
algorithms)
What is tag? A tag is a piece of information that describes the data or content that it is assigned to. Tags are non-hierarchical keywords used for Internet bookmarks, digital images, videos, files and so on. A tag doesn't carry any information or semantics.
Tagging serves many functions, including: Classification Marking ownership Describing content type Online identity
About tagging
Labeling and Tagging are done to aid in classification, marking, ownership, noting boundaries and indicating online identity. They may take the form of words, images or marks.
Online & internet databases deploy them as a way for publishers to help users to find content.
Where they are used?Social bookmarking :- provides users to add
tags to their bookmarks.Flickr :- allows users to add their own text
tags to each of their pictures, constructing flexible & easy metadata that makes pictures highly searchable.
YouTube :- also implements tagging. They categorise content using simple keywords. The users add tags which are visible and themselves link to other items that share that keyword tag.
Examples Within a Blog : - Many blog systems allow
authors to add free-form tags to a post. For example, a post may display that it has been tagged with baseball and tickets.
For an event :- An official tag is a keyword adopted by events to use in their web applications, such as blog entries, photos of the event and persentation slides.
In research :- Associate an item with a small no of themes, then a group of tags for these themes can be attached. In this way free form classification allows author to manage large amounts of information.
Tag typesTriple Tags : - Triple tag or Machine tag uses a special tag to define extra semantics
information about the tag, making it more meaningful for interpretation.
Triple tags comprise of - a namespace , a predicate & a value .
Tag types Hash Tag : - Word or phrase prefixed with #. Form
of metadata tag. Short messages on social networking such as twitter , facebook may be tagged by putting #.
before important words. Hash tag provides a means of grouping such
messages since one can search for hash tags and get the set of messages that contain it.
Knowledge tag : - it is a type of meta information that describes or defines some aspect of information resource. They are the type of metadata that captures knowledge in the form of descriptions, classification, comments, notes, hyperlinks etc.
Information Retrieval Systems
Information retrieval is the activity of obtaining information resources relevant to an information need from collection of information resources. Searches can be based on metadata or on full text.
The Information Retrieval Cycle
04/11/2023Introduction to Information Retrieval 19
SourceSelection
Search
Query
Selection
Ranked List
result
Documents
QueryFormulation
Resource
query reformulation,relevance feedback
Search Process
04/11/2023Introduction to Information Retrieval 20
SourceSelection
Search
Query
Selection
Ranked List
Results
Documents
QueryFormulation
Resource
Indexing Index
Document Collection
Slide is from Jimmy Lin’s tutorial
Implementation-How Recommender System Works
In case we use content based filteringCosine similarity formula is utilized as follows
Where wc and ws are TF-IDF weight vectors
Implementation-How Recommender System WorksIn case we use collaborative filtering Pearson similarity formula is used as follows
sim(x,y)-similarity between user x and y rx,s – rating for item “s” given by user “x”
ry,s – rating for item “s” given by user “y”
ry- mean of all ratings by user “y”
rx- mean of all ratings by user “x”
Implementation-How Recommender System WorksResnick formula is then applied after calculating the similarity and mean to predict the ratings that a particular user might give to an item. The Resnick formula for prediction is:
c(i)–predicted rating for item ’i’ in consumer profile c Cmean–mean of ratings for all items rated by user ‘c’ pmean–mean of ratings for all items rated by user ‘p’ p(i)–rating for item I by a producer p who has rated ‘i’ sim(c,p)–measure of the similarity between profiles c
and p P(i)–neighbourhood generated for the user ‘c’
Similarity ModelVector-space modelThis is a model that allows us to extract documents based on the tags given by a user through a query. Vector space model uses TF-IDF weights to categorise the documents into relevant and non-relevant ones. The end result is the document(s) having best similarity with the tags given in the query.
04/11/2023Introduction to Information Retrieval 24
The Vector-Space ModelAssume t distinct terms remain after
preprocessing; call them index terms or the vocabulary.
These “orthogonal” terms form a vector space. Dimension = t = |vocabulary|
Each term, i, in a document or query, j, is given a real-valued weight, wij.
Both documents and queries are expressed as t-dimensional vectors: dj = (w1j, w2j, …, wtj)
25
Document CollectionA collection of n documents can be represented in the
vector space model by a term-document matrix.An entry in the matrix corresponds to the “weight” of a
term in the document; zero means the term has no significance in the document or it simply doesn’t exist in the document.
26
T1 T2 …. Tt
D1 w11 w21 … wt1
D2 w12 w22 … wt2
: : : : : : : :Dn w1n w2n … wtn
Issues for Vector Space ModelHow to determine important words in a
document?◦ Word sense?◦ Word n-grams (and phrases, idioms,…)
termsHow to determine the degree of importance
of a term within a document and within the entire collection?
How to determine the degree of similarity between a document and the query?
In the case of the web, what is a collection and what are the effects of links, formatting information, etc.?
27
Term Weights: Term Frequency
More frequent terms in a document are more important, i.e. more indicative of the topic. fij = frequency of term i in document j
May want to normalize term frequency (TF) by dividing by the frequency of the most common term in the document: TFij = fij / maxi{fij}
28
Term Weights: Inverse Document Frequency
Terms that appear in many different documents are less indicative of overall topic.
df i = document frequency of term i
= number of documents containing term i IDFi = inverse document frequency of term i,
= log2 (N/ df i)
(N: total number of documents)An indication of a term’s discrimination power.Log used to dampen the effect relative to tf.
29
TF-IDF Weighting
A typical combined term importance indicator is TF-IDF weighting:
wij = TFij -IDFi = TFij log2 (N/ dfi) A term occurring frequently in the document
but rarely in the rest of the collection is given high weight.
Many other ways of determining term weights have been proposed.
Experimentally, TF-IDF has been found to work well.
30
Computing TF-IDF - An Example
Given a document containing terms with given frequencies:
A(3), B(2), C(1)Assume collection contains 10,000 documents and document frequencies of these terms are: A(50), B(1300), C(250)Then:A: TF = 3/3; IDF = log2(10000/50) = 7.6; TF-IDF =
7.6B: TF= 2/3; IDF = log2 (10000/1300) = 2.9; TF-IDF =
2.0C: TF= 1/3; IDF= log2 (10000/250) = 5.3; TF-IDF = 1.8
31
Performance and Correction MeasuresPrecision- is the fraction of documents
retrieved that are relevant to the user’s information need.
Recall- Recall is the fraction of the documents that are relevant to the query that are successfully retrieved
F-MeasureMean Absolute Error(MAE)
Precision vs. Recall
04/11/2023Introduction to Information Retrieval 33
Relevant
Retrieved
|Collectionin Rel|
|edRelRetriev| Recall
|Retrieved|
|edRelRetriev| Precision
All docs
F-Measure The weighted harmonic mean of precision
and recall , the traditional f- measure or balanced F-source is
F-measure = 2 *precision*recall (precision+recall)
Mean Absolute Error(MAE)Mean absolute error for a set of queries is calculated as average of the absolute difference between the predicted rating and the actual rating for each query.
Where n is the total number of queries, is the prediction and is the true value and the absolute error is
Datasets
We have studied the datasets of some popular sites and have implemented basic functions like Pearson similarity, Cosine similarity, Resnick prediction formula and Tf-Idf model on them. The datasets we studied are as follows:
MovieLens DatasetFlickr Dataset
MovieLens DatasetMovieLens is a recommender system and
virtual community website that recommends films based on user-provided ratings.
The dataset on which we have worked contains a total of 1,00,000 ratings from 943 users on 1682 movie items.
It was collected from September 19th, 1997 to April 22nd, 1998.
The dataset includes file that has every entry in 4-tuples <user_id><item_id><rating><timestamp>.
Flickr DatasetFlickr is an image hosting and video hosting
website where people host images that they embed in blogs and social media.
The dataset we have used is MRFLICKR-25000 and it is a collection of 25000 images downloaded from the social photography site Flickr through its public API.
The average number of tags per image is 8.94. In the collection there are 1386 tags which occur in at least 20 images.
The dataset includes a meta-data folder named “meta” that contains all the tags associated with a particular image in a respective file.
Visit my blog for more
www.csekapil.wordpress.comMotilal Nehru National institute of
Tech.Allahabad.(india)
Recommended