175
COMP4121 Advanced Algorithms Aleks Ignjatovi´ c School of Computer Science and Engineering University of New South Wales Sydney Recommender Systems COMP4121 1 / 30

COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

THE UNIVERSITY OFNEW SOUTH WALES

COMP4121 Advanced Algorithms

Aleks Ignjatovic

School of Computer Science and EngineeringUniversity of New South Wales Sydney

Recommender Systems

COMP4121 1 / 30

Page 2: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

COMP4121 2 / 30

Page 3: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Main purpose: the noble goal of selling you as much stuff as possible,regardless of whether you need it or not.

Examples of recommender systems:

Netflix’s, to recommend to you which movie to see next.Amazon’s, to recommend to you which book to buy next.Kogan’s to recommend which gizmo to buy next.IEEE’s Xplore: to recommend which articles might be of interest toyou, given what article you have just look at.

Two major kinds of recommender systems:

content based: items are recommended by their intrinsicsimilarity (i.e., similar properties, qualities, kind etc.)For example a book might be recommended because you bought abook on a similar topic)collaborative filtering: items are recommended based on somesimilarity measure between users and items based on ratings ofitems by the community of users.

COMP4121 3 / 30

Page 4: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Main purpose: the noble goal of selling you as much stuff as possible,regardless of whether you need it or not.

Examples of recommender systems:

Netflix’s, to recommend to you which movie to see next.Amazon’s, to recommend to you which book to buy next.Kogan’s to recommend which gizmo to buy next.IEEE’s Xplore: to recommend which articles might be of interest toyou, given what article you have just look at.

Two major kinds of recommender systems:

content based: items are recommended by their intrinsicsimilarity (i.e., similar properties, qualities, kind etc.)For example a book might be recommended because you bought abook on a similar topic)collaborative filtering: items are recommended based on somesimilarity measure between users and items based on ratings ofitems by the community of users.

COMP4121 3 / 30

Page 5: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Main purpose: the noble goal of selling you as much stuff as possible,regardless of whether you need it or not.

Examples of recommender systems:

Netflix’s, to recommend to you which movie to see next.

Amazon’s, to recommend to you which book to buy next.Kogan’s to recommend which gizmo to buy next.IEEE’s Xplore: to recommend which articles might be of interest toyou, given what article you have just look at.

Two major kinds of recommender systems:

content based: items are recommended by their intrinsicsimilarity (i.e., similar properties, qualities, kind etc.)For example a book might be recommended because you bought abook on a similar topic)collaborative filtering: items are recommended based on somesimilarity measure between users and items based on ratings ofitems by the community of users.

COMP4121 3 / 30

Page 6: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Main purpose: the noble goal of selling you as much stuff as possible,regardless of whether you need it or not.

Examples of recommender systems:

Netflix’s, to recommend to you which movie to see next.Amazon’s, to recommend to you which book to buy next.

Kogan’s to recommend which gizmo to buy next.IEEE’s Xplore: to recommend which articles might be of interest toyou, given what article you have just look at.

Two major kinds of recommender systems:

content based: items are recommended by their intrinsicsimilarity (i.e., similar properties, qualities, kind etc.)For example a book might be recommended because you bought abook on a similar topic)collaborative filtering: items are recommended based on somesimilarity measure between users and items based on ratings ofitems by the community of users.

COMP4121 3 / 30

Page 7: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Main purpose: the noble goal of selling you as much stuff as possible,regardless of whether you need it or not.

Examples of recommender systems:

Netflix’s, to recommend to you which movie to see next.Amazon’s, to recommend to you which book to buy next.Kogan’s to recommend which gizmo to buy next.

IEEE’s Xplore: to recommend which articles might be of interest toyou, given what article you have just look at.

Two major kinds of recommender systems:

content based: items are recommended by their intrinsicsimilarity (i.e., similar properties, qualities, kind etc.)For example a book might be recommended because you bought abook on a similar topic)collaborative filtering: items are recommended based on somesimilarity measure between users and items based on ratings ofitems by the community of users.

COMP4121 3 / 30

Page 8: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Main purpose: the noble goal of selling you as much stuff as possible,regardless of whether you need it or not.

Examples of recommender systems:

Netflix’s, to recommend to you which movie to see next.Amazon’s, to recommend to you which book to buy next.Kogan’s to recommend which gizmo to buy next.IEEE’s Xplore: to recommend which articles might be of interest toyou, given what article you have just look at.

Two major kinds of recommender systems:

content based: items are recommended by their intrinsicsimilarity (i.e., similar properties, qualities, kind etc.)For example a book might be recommended because you bought abook on a similar topic)collaborative filtering: items are recommended based on somesimilarity measure between users and items based on ratings ofitems by the community of users.

COMP4121 3 / 30

Page 9: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Main purpose: the noble goal of selling you as much stuff as possible,regardless of whether you need it or not.

Examples of recommender systems:

Netflix’s, to recommend to you which movie to see next.Amazon’s, to recommend to you which book to buy next.Kogan’s to recommend which gizmo to buy next.IEEE’s Xplore: to recommend which articles might be of interest toyou, given what article you have just look at.

Two major kinds of recommender systems:

content based: items are recommended by their intrinsicsimilarity (i.e., similar properties, qualities, kind etc.)For example a book might be recommended because you bought abook on a similar topic)collaborative filtering: items are recommended based on somesimilarity measure between users and items based on ratings ofitems by the community of users.

COMP4121 3 / 30

Page 10: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Main purpose: the noble goal of selling you as much stuff as possible,regardless of whether you need it or not.

Examples of recommender systems:

Netflix’s, to recommend to you which movie to see next.Amazon’s, to recommend to you which book to buy next.Kogan’s to recommend which gizmo to buy next.IEEE’s Xplore: to recommend which articles might be of interest toyou, given what article you have just look at.

Two major kinds of recommender systems:

content based: items are recommended by their intrinsicsimilarity (i.e., similar properties, qualities, kind etc.)For example a book might be recommended because you bought abook on a similar topic)

collaborative filtering: items are recommended based on somesimilarity measure between users and items based on ratings ofitems by the community of users.

COMP4121 3 / 30

Page 11: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Main purpose: the noble goal of selling you as much stuff as possible,regardless of whether you need it or not.

Examples of recommender systems:

Netflix’s, to recommend to you which movie to see next.Amazon’s, to recommend to you which book to buy next.Kogan’s to recommend which gizmo to buy next.IEEE’s Xplore: to recommend which articles might be of interest toyou, given what article you have just look at.

Two major kinds of recommender systems:

content based: items are recommended by their intrinsicsimilarity (i.e., similar properties, qualities, kind etc.)For example a book might be recommended because you bought abook on a similar topic)collaborative filtering: items are recommended based on somesimilarity measure between users and items based on ratings ofitems by the community of users.

COMP4121 3 / 30

Page 12: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Content based recommender systems suffer a serious problem: classificationaccording to content usually has to be done by humans because content is asemantic notion and machines are still not good at dealing with semantics.

Collaborative filtering tends to be superior in performance and does not relyon human advice.

A Representative Example: Assume users are rating movies that they haveseen. On the basis of such information we would like to recommend to a user amovie he has not already seen.

Two main approaches: the Neighbourhood Method and the LatentFactor Method

The Neighbourhood Method comes in two flavours:

(I) based on similarity of users:it happens that two users gave “similar” evaluations to movies thatthey have both seen.there is a movie which one of the users liked a lot but the other userhas not seenin such a case it is reasonable to recommend such a movie to thatuser.

COMP4121 4 / 30

Page 13: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Content based recommender systems suffer a serious problem: classificationaccording to content usually has to be done by humans because content is asemantic notion and machines are still not good at dealing with semantics.

Collaborative filtering tends to be superior in performance and does not relyon human advice.

A Representative Example: Assume users are rating movies that they haveseen. On the basis of such information we would like to recommend to a user amovie he has not already seen.

Two main approaches: the Neighbourhood Method and the LatentFactor Method

The Neighbourhood Method comes in two flavours:

(I) based on similarity of users:it happens that two users gave “similar” evaluations to movies thatthey have both seen.there is a movie which one of the users liked a lot but the other userhas not seenin such a case it is reasonable to recommend such a movie to thatuser.

COMP4121 4 / 30

Page 14: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Content based recommender systems suffer a serious problem: classificationaccording to content usually has to be done by humans because content is asemantic notion and machines are still not good at dealing with semantics.

Collaborative filtering tends to be superior in performance and does not relyon human advice.

A Representative Example: Assume users are rating movies that they haveseen. On the basis of such information we would like to recommend to a user amovie he has not already seen.

Two main approaches: the Neighbourhood Method and the LatentFactor Method

The Neighbourhood Method comes in two flavours:

(I) based on similarity of users:it happens that two users gave “similar” evaluations to movies thatthey have both seen.there is a movie which one of the users liked a lot but the other userhas not seenin such a case it is reasonable to recommend such a movie to thatuser.

COMP4121 4 / 30

Page 15: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Content based recommender systems suffer a serious problem: classificationaccording to content usually has to be done by humans because content is asemantic notion and machines are still not good at dealing with semantics.

Collaborative filtering tends to be superior in performance and does not relyon human advice.

A Representative Example: Assume users are rating movies that they haveseen. On the basis of such information we would like to recommend to a user amovie he has not already seen.

Two main approaches: the Neighbourhood Method and the LatentFactor Method

The Neighbourhood Method comes in two flavours:

(I) based on similarity of users:it happens that two users gave “similar” evaluations to movies thatthey have both seen.there is a movie which one of the users liked a lot but the other userhas not seenin such a case it is reasonable to recommend such a movie to thatuser.

COMP4121 4 / 30

Page 16: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Content based recommender systems suffer a serious problem: classificationaccording to content usually has to be done by humans because content is asemantic notion and machines are still not good at dealing with semantics.

Collaborative filtering tends to be superior in performance and does not relyon human advice.

A Representative Example: Assume users are rating movies that they haveseen. On the basis of such information we would like to recommend to a user amovie he has not already seen.

Two main approaches: the Neighbourhood Method and the LatentFactor Method

The Neighbourhood Method comes in two flavours:

(I) based on similarity of users:it happens that two users gave “similar” evaluations to movies thatthey have both seen.there is a movie which one of the users liked a lot but the other userhas not seenin such a case it is reasonable to recommend such a movie to thatuser.

COMP4121 4 / 30

Page 17: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Content based recommender systems suffer a serious problem: classificationaccording to content usually has to be done by humans because content is asemantic notion and machines are still not good at dealing with semantics.

Collaborative filtering tends to be superior in performance and does not relyon human advice.

A Representative Example: Assume users are rating movies that they haveseen. On the basis of such information we would like to recommend to a user amovie he has not already seen.

Two main approaches: the Neighbourhood Method and the LatentFactor Method

The Neighbourhood Method comes in two flavours:

(I) based on similarity of users:

it happens that two users gave “similar” evaluations to movies thatthey have both seen.there is a movie which one of the users liked a lot but the other userhas not seenin such a case it is reasonable to recommend such a movie to thatuser.

COMP4121 4 / 30

Page 18: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Content based recommender systems suffer a serious problem: classificationaccording to content usually has to be done by humans because content is asemantic notion and machines are still not good at dealing with semantics.

Collaborative filtering tends to be superior in performance and does not relyon human advice.

A Representative Example: Assume users are rating movies that they haveseen. On the basis of such information we would like to recommend to a user amovie he has not already seen.

Two main approaches: the Neighbourhood Method and the LatentFactor Method

The Neighbourhood Method comes in two flavours:

(I) based on similarity of users:it happens that two users gave “similar” evaluations to movies thatthey have both seen.

there is a movie which one of the users liked a lot but the other userhas not seenin such a case it is reasonable to recommend such a movie to thatuser.

COMP4121 4 / 30

Page 19: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Content based recommender systems suffer a serious problem: classificationaccording to content usually has to be done by humans because content is asemantic notion and machines are still not good at dealing with semantics.

Collaborative filtering tends to be superior in performance and does not relyon human advice.

A Representative Example: Assume users are rating movies that they haveseen. On the basis of such information we would like to recommend to a user amovie he has not already seen.

Two main approaches: the Neighbourhood Method and the LatentFactor Method

The Neighbourhood Method comes in two flavours:

(I) based on similarity of users:it happens that two users gave “similar” evaluations to movies thatthey have both seen.there is a movie which one of the users liked a lot but the other userhas not seen

in such a case it is reasonable to recommend such a movie to thatuser.

COMP4121 4 / 30

Page 20: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

Content based recommender systems suffer a serious problem: classificationaccording to content usually has to be done by humans because content is asemantic notion and machines are still not good at dealing with semantics.

Collaborative filtering tends to be superior in performance and does not relyon human advice.

A Representative Example: Assume users are rating movies that they haveseen. On the basis of such information we would like to recommend to a user amovie he has not already seen.

Two main approaches: the Neighbourhood Method and the LatentFactor Method

The Neighbourhood Method comes in two flavours:

(I) based on similarity of users:it happens that two users gave “similar” evaluations to movies thatthey have both seen.there is a movie which one of the users liked a lot but the other userhas not seenin such a case it is reasonable to recommend such a movie to thatuser.

COMP4121 4 / 30

Page 21: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

(II) based on similarity of items:

it happens that two movies receive similar ratings by most users;a user has seen one of the two movies and liked it.it is reasonable to recommend the other movie to such a user

Note that in both approaches movies are not categorised and compared bytheir “intrinsic” features but we rely only on the “wisdom of the crowd”.

We now want to explore how such similarities of users and of items aremeasured in a most informative way.

We can construct a sparsely populated table of ratings R; the rows willcorrespond to movies, the columns to users. The entry r(j, i) of the table, ifnon empty, represents the rating user Ui gave to movie Mj (in general, itemMj).

Usually, such a rating is the “number of stars”, in range 1− 5 (or a similar,relatively small rating range, usually with at most 10 or so levels).

COMP4121 5 / 30

Page 22: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

(II) based on similarity of items:it happens that two movies receive similar ratings by most users;

a user has seen one of the two movies and liked it.it is reasonable to recommend the other movie to such a user

Note that in both approaches movies are not categorised and compared bytheir “intrinsic” features but we rely only on the “wisdom of the crowd”.

We now want to explore how such similarities of users and of items aremeasured in a most informative way.

We can construct a sparsely populated table of ratings R; the rows willcorrespond to movies, the columns to users. The entry r(j, i) of the table, ifnon empty, represents the rating user Ui gave to movie Mj (in general, itemMj).

Usually, such a rating is the “number of stars”, in range 1− 5 (or a similar,relatively small rating range, usually with at most 10 or so levels).

COMP4121 5 / 30

Page 23: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

(II) based on similarity of items:it happens that two movies receive similar ratings by most users;a user has seen one of the two movies and liked it.

it is reasonable to recommend the other movie to such a user

Note that in both approaches movies are not categorised and compared bytheir “intrinsic” features but we rely only on the “wisdom of the crowd”.

We now want to explore how such similarities of users and of items aremeasured in a most informative way.

We can construct a sparsely populated table of ratings R; the rows willcorrespond to movies, the columns to users. The entry r(j, i) of the table, ifnon empty, represents the rating user Ui gave to movie Mj (in general, itemMj).

Usually, such a rating is the “number of stars”, in range 1− 5 (or a similar,relatively small rating range, usually with at most 10 or so levels).

COMP4121 5 / 30

Page 24: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

(II) based on similarity of items:it happens that two movies receive similar ratings by most users;a user has seen one of the two movies and liked it.it is reasonable to recommend the other movie to such a user

Note that in both approaches movies are not categorised and compared bytheir “intrinsic” features but we rely only on the “wisdom of the crowd”.

We now want to explore how such similarities of users and of items aremeasured in a most informative way.

We can construct a sparsely populated table of ratings R; the rows willcorrespond to movies, the columns to users. The entry r(j, i) of the table, ifnon empty, represents the rating user Ui gave to movie Mj (in general, itemMj).

Usually, such a rating is the “number of stars”, in range 1− 5 (or a similar,relatively small rating range, usually with at most 10 or so levels).

COMP4121 5 / 30

Page 25: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

(II) based on similarity of items:it happens that two movies receive similar ratings by most users;a user has seen one of the two movies and liked it.it is reasonable to recommend the other movie to such a user

Note that in both approaches movies are not categorised and compared bytheir “intrinsic” features but we rely only on the “wisdom of the crowd”.

We now want to explore how such similarities of users and of items aremeasured in a most informative way.

We can construct a sparsely populated table of ratings R; the rows willcorrespond to movies, the columns to users. The entry r(j, i) of the table, ifnon empty, represents the rating user Ui gave to movie Mj (in general, itemMj).

Usually, such a rating is the “number of stars”, in range 1− 5 (or a similar,relatively small rating range, usually with at most 10 or so levels).

COMP4121 5 / 30

Page 26: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

(II) based on similarity of items:it happens that two movies receive similar ratings by most users;a user has seen one of the two movies and liked it.it is reasonable to recommend the other movie to such a user

Note that in both approaches movies are not categorised and compared bytheir “intrinsic” features but we rely only on the “wisdom of the crowd”.

We now want to explore how such similarities of users and of items aremeasured in a most informative way.

We can construct a sparsely populated table of ratings R; the rows willcorrespond to movies, the columns to users. The entry r(j, i) of the table, ifnon empty, represents the rating user Ui gave to movie Mj (in general, itemMj).

Usually, such a rating is the “number of stars”, in range 1− 5 (or a similar,relatively small rating range, usually with at most 10 or so levels).

COMP4121 5 / 30

Page 27: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

(II) based on similarity of items:it happens that two movies receive similar ratings by most users;a user has seen one of the two movies and liked it.it is reasonable to recommend the other movie to such a user

Note that in both approaches movies are not categorised and compared bytheir “intrinsic” features but we rely only on the “wisdom of the crowd”.

We now want to explore how such similarities of users and of items aremeasured in a most informative way.

We can construct a sparsely populated table of ratings R; the rows willcorrespond to movies, the columns to users. The entry r(j, i) of the table, ifnon empty, represents the rating user Ui gave to movie Mj (in general, itemMj).

Usually, such a rating is the “number of stars”, in range 1− 5 (or a similar,relatively small rating range, usually with at most 10 or so levels).

COMP4121 5 / 30

Page 28: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems

(II) based on similarity of items:it happens that two movies receive similar ratings by most users;a user has seen one of the two movies and liked it.it is reasonable to recommend the other movie to such a user

Note that in both approaches movies are not categorised and compared bytheir “intrinsic” features but we rely only on the “wisdom of the crowd”.

We now want to explore how such similarities of users and of items aremeasured in a most informative way.

We can construct a sparsely populated table of ratings R; the rows willcorrespond to movies, the columns to users. The entry r(j, i) of the table, ifnon empty, represents the rating user Ui gave to movie Mj (in general, itemMj).

Usually, such a rating is the “number of stars”, in range 1− 5 (or a similar,relatively small rating range, usually with at most 10 or so levels).

COMP4121 5 / 30

Page 29: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Having scores which are all positive numbers, say between 1− 5, is notconvenient.

A more informative number can be obtained by computing the mean r of allratings of all users for all movies (thus, the mean of all numbers in our partialtable of ratings R)

We now obtain from table R a new table R by replacing all ratings r(j, i) in Rby the values r∗(j, i) = r(j, i)− r.Now numbers r∗(j, i) are already more informative: if r∗(j, i) > 0 this means,in a sense, that user Ui has liked movie Mj above the “average”.

Some users are more generous and tend to give higher scores that the averageuser; some are more critical and tend to give lower scores.

We are not interested in evaluating generosity of users, we want to assess onlythe “taste” of users: what they like more and what they like less.

Similarly, some movies get higher scores because they are popular at themoment for whatever reason and some movies have less “hype” about thembecause they might be older and less trendy.

Again, we are not interested in “absolute popularity” or “trendiness” of amovie, rather, we would like to assess how “intrinsically likeable” a movie is.

COMP4121 6 / 30

Page 30: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Having scores which are all positive numbers, say between 1− 5, is notconvenient.

A more informative number can be obtained by computing the mean r of allratings of all users for all movies (thus, the mean of all numbers in our partialtable of ratings R)

We now obtain from table R a new table R by replacing all ratings r(j, i) in Rby the values r∗(j, i) = r(j, i)− r.Now numbers r∗(j, i) are already more informative: if r∗(j, i) > 0 this means,in a sense, that user Ui has liked movie Mj above the “average”.

Some users are more generous and tend to give higher scores that the averageuser; some are more critical and tend to give lower scores.

We are not interested in evaluating generosity of users, we want to assess onlythe “taste” of users: what they like more and what they like less.

Similarly, some movies get higher scores because they are popular at themoment for whatever reason and some movies have less “hype” about thembecause they might be older and less trendy.

Again, we are not interested in “absolute popularity” or “trendiness” of amovie, rather, we would like to assess how “intrinsically likeable” a movie is.

COMP4121 6 / 30

Page 31: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Having scores which are all positive numbers, say between 1− 5, is notconvenient.

A more informative number can be obtained by computing the mean r of allratings of all users for all movies (thus, the mean of all numbers in our partialtable of ratings R)

We now obtain from table R a new table R by replacing all ratings r(j, i) in Rby the values r∗(j, i) = r(j, i)− r.

Now numbers r∗(j, i) are already more informative: if r∗(j, i) > 0 this means,in a sense, that user Ui has liked movie Mj above the “average”.

Some users are more generous and tend to give higher scores that the averageuser; some are more critical and tend to give lower scores.

We are not interested in evaluating generosity of users, we want to assess onlythe “taste” of users: what they like more and what they like less.

Similarly, some movies get higher scores because they are popular at themoment for whatever reason and some movies have less “hype” about thembecause they might be older and less trendy.

Again, we are not interested in “absolute popularity” or “trendiness” of amovie, rather, we would like to assess how “intrinsically likeable” a movie is.

COMP4121 6 / 30

Page 32: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Having scores which are all positive numbers, say between 1− 5, is notconvenient.

A more informative number can be obtained by computing the mean r of allratings of all users for all movies (thus, the mean of all numbers in our partialtable of ratings R)

We now obtain from table R a new table R by replacing all ratings r(j, i) in Rby the values r∗(j, i) = r(j, i)− r.Now numbers r∗(j, i) are already more informative: if r∗(j, i) > 0 this means,in a sense, that user Ui has liked movie Mj above the “average”.

Some users are more generous and tend to give higher scores that the averageuser; some are more critical and tend to give lower scores.

We are not interested in evaluating generosity of users, we want to assess onlythe “taste” of users: what they like more and what they like less.

Similarly, some movies get higher scores because they are popular at themoment for whatever reason and some movies have less “hype” about thembecause they might be older and less trendy.

Again, we are not interested in “absolute popularity” or “trendiness” of amovie, rather, we would like to assess how “intrinsically likeable” a movie is.

COMP4121 6 / 30

Page 33: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Having scores which are all positive numbers, say between 1− 5, is notconvenient.

A more informative number can be obtained by computing the mean r of allratings of all users for all movies (thus, the mean of all numbers in our partialtable of ratings R)

We now obtain from table R a new table R by replacing all ratings r(j, i) in Rby the values r∗(j, i) = r(j, i)− r.Now numbers r∗(j, i) are already more informative: if r∗(j, i) > 0 this means,in a sense, that user Ui has liked movie Mj above the “average”.

Some users are more generous and tend to give higher scores that the averageuser; some are more critical and tend to give lower scores.

We are not interested in evaluating generosity of users, we want to assess onlythe “taste” of users: what they like more and what they like less.

Similarly, some movies get higher scores because they are popular at themoment for whatever reason and some movies have less “hype” about thembecause they might be older and less trendy.

Again, we are not interested in “absolute popularity” or “trendiness” of amovie, rather, we would like to assess how “intrinsically likeable” a movie is.

COMP4121 6 / 30

Page 34: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Having scores which are all positive numbers, say between 1− 5, is notconvenient.

A more informative number can be obtained by computing the mean r of allratings of all users for all movies (thus, the mean of all numbers in our partialtable of ratings R)

We now obtain from table R a new table R by replacing all ratings r(j, i) in Rby the values r∗(j, i) = r(j, i)− r.Now numbers r∗(j, i) are already more informative: if r∗(j, i) > 0 this means,in a sense, that user Ui has liked movie Mj above the “average”.

Some users are more generous and tend to give higher scores that the averageuser; some are more critical and tend to give lower scores.

We are not interested in evaluating generosity of users, we want to assess onlythe “taste” of users: what they like more and what they like less.

Similarly, some movies get higher scores because they are popular at themoment for whatever reason and some movies have less “hype” about thembecause they might be older and less trendy.

Again, we are not interested in “absolute popularity” or “trendiness” of amovie, rather, we would like to assess how “intrinsically likeable” a movie is.

COMP4121 6 / 30

Page 35: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Having scores which are all positive numbers, say between 1− 5, is notconvenient.

A more informative number can be obtained by computing the mean r of allratings of all users for all movies (thus, the mean of all numbers in our partialtable of ratings R)

We now obtain from table R a new table R by replacing all ratings r(j, i) in Rby the values r∗(j, i) = r(j, i)− r.Now numbers r∗(j, i) are already more informative: if r∗(j, i) > 0 this means,in a sense, that user Ui has liked movie Mj above the “average”.

Some users are more generous and tend to give higher scores that the averageuser; some are more critical and tend to give lower scores.

We are not interested in evaluating generosity of users, we want to assess onlythe “taste” of users: what they like more and what they like less.

Similarly, some movies get higher scores because they are popular at themoment for whatever reason and some movies have less “hype” about thembecause they might be older and less trendy.

Again, we are not interested in “absolute popularity” or “trendiness” of amovie, rather, we would like to assess how “intrinsically likeable” a movie is.

COMP4121 6 / 30

Page 36: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Having scores which are all positive numbers, say between 1− 5, is notconvenient.

A more informative number can be obtained by computing the mean r of allratings of all users for all movies (thus, the mean of all numbers in our partialtable of ratings R)

We now obtain from table R a new table R by replacing all ratings r(j, i) in Rby the values r∗(j, i) = r(j, i)− r.Now numbers r∗(j, i) are already more informative: if r∗(j, i) > 0 this means,in a sense, that user Ui has liked movie Mj above the “average”.

Some users are more generous and tend to give higher scores that the averageuser; some are more critical and tend to give lower scores.

We are not interested in evaluating generosity of users, we want to assess onlythe “taste” of users: what they like more and what they like less.

Similarly, some movies get higher scores because they are popular at themoment for whatever reason and some movies have less “hype” about thembecause they might be older and less trendy.

Again, we are not interested in “absolute popularity” or “trendiness” of amovie, rather, we would like to assess how “intrinsically likeable” a movie is.

COMP4121 6 / 30

Page 37: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

For that reason we want to remove the “systematic biases” of both the usersand the movies, thus taking out the individual “generosity” of each user andthe “hype” of each movie.

For that purpose we introduce for every user Ui a variable υi standing for the“individual bias” of user Ui reflecting his tendency to give overall higher orlower scores.

We also introduce for every movie Mj a variable µj standing for the “hypebias” of movie Mj which is due to how “fashionable” the movie is (whichanyhow usually quickly fades with time)

We now remove both such systematic biases by seeking the values of variablesυi and variables µj which minimises the expression

S(~υ, ~µ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2

Note that µ ’s are constant shifts of rows (each row corresponding to a movie)and υ’s are constant shifts of columns (each corresponding to a user)

We chose such constant shifts of each row and of each column which minimisethe residuals.

Each such residual r(j, i) = r∗(j, i)− υi − µj) then better represents the“intrinsic” sentiment of a user Ui for a movie Mj .

COMP4121 7 / 30

Page 38: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

For that reason we want to remove the “systematic biases” of both the usersand the movies, thus taking out the individual “generosity” of each user andthe “hype” of each movie.

For that purpose we introduce for every user Ui a variable υi standing for the“individual bias” of user Ui reflecting his tendency to give overall higher orlower scores.

We also introduce for every movie Mj a variable µj standing for the “hypebias” of movie Mj which is due to how “fashionable” the movie is (whichanyhow usually quickly fades with time)

We now remove both such systematic biases by seeking the values of variablesυi and variables µj which minimises the expression

S(~υ, ~µ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2

Note that µ ’s are constant shifts of rows (each row corresponding to a movie)and υ’s are constant shifts of columns (each corresponding to a user)

We chose such constant shifts of each row and of each column which minimisethe residuals.

Each such residual r(j, i) = r∗(j, i)− υi − µj) then better represents the“intrinsic” sentiment of a user Ui for a movie Mj .

COMP4121 7 / 30

Page 39: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

For that reason we want to remove the “systematic biases” of both the usersand the movies, thus taking out the individual “generosity” of each user andthe “hype” of each movie.

For that purpose we introduce for every user Ui a variable υi standing for the“individual bias” of user Ui reflecting his tendency to give overall higher orlower scores.

We also introduce for every movie Mj a variable µj standing for the “hypebias” of movie Mj which is due to how “fashionable” the movie is (whichanyhow usually quickly fades with time)

We now remove both such systematic biases by seeking the values of variablesυi and variables µj which minimises the expression

S(~υ, ~µ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2

Note that µ ’s are constant shifts of rows (each row corresponding to a movie)and υ’s are constant shifts of columns (each corresponding to a user)

We chose such constant shifts of each row and of each column which minimisethe residuals.

Each such residual r(j, i) = r∗(j, i)− υi − µj) then better represents the“intrinsic” sentiment of a user Ui for a movie Mj .

COMP4121 7 / 30

Page 40: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

For that reason we want to remove the “systematic biases” of both the usersand the movies, thus taking out the individual “generosity” of each user andthe “hype” of each movie.

For that purpose we introduce for every user Ui a variable υi standing for the“individual bias” of user Ui reflecting his tendency to give overall higher orlower scores.

We also introduce for every movie Mj a variable µj standing for the “hypebias” of movie Mj which is due to how “fashionable” the movie is (whichanyhow usually quickly fades with time)

We now remove both such systematic biases by seeking the values of variablesυi and variables µj which minimises the expression

S(~υ, ~µ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2

Note that µ ’s are constant shifts of rows (each row corresponding to a movie)and υ’s are constant shifts of columns (each corresponding to a user)

We chose such constant shifts of each row and of each column which minimisethe residuals.

Each such residual r(j, i) = r∗(j, i)− υi − µj) then better represents the“intrinsic” sentiment of a user Ui for a movie Mj .

COMP4121 7 / 30

Page 41: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

For that reason we want to remove the “systematic biases” of both the usersand the movies, thus taking out the individual “generosity” of each user andthe “hype” of each movie.

For that purpose we introduce for every user Ui a variable υi standing for the“individual bias” of user Ui reflecting his tendency to give overall higher orlower scores.

We also introduce for every movie Mj a variable µj standing for the “hypebias” of movie Mj which is due to how “fashionable” the movie is (whichanyhow usually quickly fades with time)

We now remove both such systematic biases by seeking the values of variablesυi and variables µj which minimises the expression

S(~υ, ~µ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2

Note that µ ’s are constant shifts of rows (each row corresponding to a movie)and υ’s are constant shifts of columns (each corresponding to a user)

We chose such constant shifts of each row and of each column which minimisethe residuals.

Each such residual r(j, i) = r∗(j, i)− υi − µj) then better represents the“intrinsic” sentiment of a user Ui for a movie Mj .

COMP4121 7 / 30

Page 42: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

For that reason we want to remove the “systematic biases” of both the usersand the movies, thus taking out the individual “generosity” of each user andthe “hype” of each movie.

For that purpose we introduce for every user Ui a variable υi standing for the“individual bias” of user Ui reflecting his tendency to give overall higher orlower scores.

We also introduce for every movie Mj a variable µj standing for the “hypebias” of movie Mj which is due to how “fashionable” the movie is (whichanyhow usually quickly fades with time)

We now remove both such systematic biases by seeking the values of variablesυi and variables µj which minimises the expression

S(~υ, ~µ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2

Note that µ ’s are constant shifts of rows (each row corresponding to a movie)and υ’s are constant shifts of columns (each corresponding to a user)

We chose such constant shifts of each row and of each column which minimisethe residuals.

Each such residual r(j, i) = r∗(j, i)− υi − µj) then better represents the“intrinsic” sentiment of a user Ui for a movie Mj .

COMP4121 7 / 30

Page 43: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

For that reason we want to remove the “systematic biases” of both the usersand the movies, thus taking out the individual “generosity” of each user andthe “hype” of each movie.

For that purpose we introduce for every user Ui a variable υi standing for the“individual bias” of user Ui reflecting his tendency to give overall higher orlower scores.

We also introduce for every movie Mj a variable µj standing for the “hypebias” of movie Mj which is due to how “fashionable” the movie is (whichanyhow usually quickly fades with time)

We now remove both such systematic biases by seeking the values of variablesυi and variables µj which minimises the expression

S(~υ, ~µ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2

Note that µ ’s are constant shifts of rows (each row corresponding to a movie)and υ’s are constant shifts of columns (each corresponding to a user)

We chose such constant shifts of each row and of each column which minimisethe residuals.

Each such residual r(j, i) = r∗(j, i)− υi − µj) then better represents the“intrinsic” sentiment of a user Ui for a movie Mj .

COMP4121 7 / 30

Page 44: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

This is a Least Squares problem and is easily reducible to a system of linearequations:

S(~υ, ~µ) achieves a minimum at ~υ, ~µ for which all the partial derivatives∂∂µj

S(~υ, ~µ) for all i and ∂∂µj

S(~υ, ~µ) for all j are equal to 0:

∂µjS(~υ, ~µ) =

∂µj

∑(j,i)∈R

(r∗(j, i)− υi − µj)2

= −2∑

i:(j,i)∈R

(r∗(j, i)− υi − µj) = 0

and

∂υiS(~υ, ~µ) =

∂υi

∑(j,i)∈R

(r∗(j, i)− υi − µj)2

= −2∑

j:(j,i)∈R

(r∗(j, i)− υi − µj) = 0

COMP4121 8 / 30

Page 45: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

This is a Least Squares problem and is easily reducible to a system of linearequations:

S(~υ, ~µ) achieves a minimum at ~υ, ~µ for which all the partial derivatives∂∂µj

S(~υ, ~µ) for all i and ∂∂µj

S(~υ, ~µ) for all j are equal to 0:

∂µjS(~υ, ~µ) =

∂µj

∑(j,i)∈R

(r∗(j, i)− υi − µj)2

= −2∑

i:(j,i)∈R

(r∗(j, i)− υi − µj) = 0

and

∂υiS(~υ, ~µ) =

∂υi

∑(j,i)∈R

(r∗(j, i)− υi − µj)2

= −2∑

j:(j,i)∈R

(r∗(j, i)− υi − µj) = 0

COMP4121 8 / 30

Page 46: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

This is a Least Squares problem and is easily reducible to a system of linearequations:

S(~υ, ~µ) achieves a minimum at ~υ, ~µ for which all the partial derivatives∂∂µj

S(~υ, ~µ) for all i and ∂∂µj

S(~υ, ~µ) for all j are equal to 0:

∂µjS(~υ, ~µ) =

∂µj

∑(j,i)∈R

(r∗(j, i)− υi − µj)2

= −2∑

i:(j,i)∈R

(r∗(j, i)− υi − µj) = 0

and

∂υiS(~υ, ~µ) =

∂υi

∑(j,i)∈R

(r∗(j, i)− υi − µj)2

= −2∑

j:(j,i)∈R

(r∗(j, i)− υi − µj) = 0

COMP4121 8 / 30

Page 47: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

This is a Least Squares problem and is easily reducible to a system of linearequations:

S(~υ, ~µ) achieves a minimum at ~υ, ~µ for which all the partial derivatives∂∂µj

S(~υ, ~µ) for all i and ∂∂µj

S(~υ, ~µ) for all j are equal to 0:

∂µjS(~υ, ~µ) =

∂µj

∑(j,i)∈R

(r∗(j, i)− υi − µj)2

= −2∑

i:(j,i)∈R

(r∗(j, i)− υi − µj) = 0

and

∂υiS(~υ, ~µ) =

∂υi

∑(j,i)∈R

(r∗(j, i)− υi − µj)2

= −2∑

j:(j,i)∈R

(r∗(j, i)− υi − µj) = 0

COMP4121 8 / 30

Page 48: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Unfortunately, Least Squares fits usually suffer from overfitting: they minimisethe objective function by choosing excessively large values for the variables.

The solution to this problem is called regularisation: we introduce a termwhich penalises for large values of the variables.

Thus, instead, we minimise the sum

S(~υ, ~µ, λ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2 + λ

(∑i

υ2i +

∑j

µ2j

)

where λ is a suitably chosen small positive constant, usually 10−10 ≤ λ ≤ 10−2.

Optimal value of λ can be “learned” in a way to be described later.

We now obtain from table R a new table R by replacing all r∗(j, i) in R withvalues r(j, i) = r∗(j, i)− υi − µj where υ′s and µ′s were obtained by ourregularised least squares fit.

Having removed the systematic biases of users and trendiness of movies, we arenow ready to estimate similarities of users and similarities of movies.

COMP4121 9 / 30

Page 49: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Unfortunately, Least Squares fits usually suffer from overfitting: they minimisethe objective function by choosing excessively large values for the variables.

The solution to this problem is called regularisation: we introduce a termwhich penalises for large values of the variables.

Thus, instead, we minimise the sum

S(~υ, ~µ, λ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2 + λ

(∑i

υ2i +

∑j

µ2j

)

where λ is a suitably chosen small positive constant, usually 10−10 ≤ λ ≤ 10−2.

Optimal value of λ can be “learned” in a way to be described later.

We now obtain from table R a new table R by replacing all r∗(j, i) in R withvalues r(j, i) = r∗(j, i)− υi − µj where υ′s and µ′s were obtained by ourregularised least squares fit.

Having removed the systematic biases of users and trendiness of movies, we arenow ready to estimate similarities of users and similarities of movies.

COMP4121 9 / 30

Page 50: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Unfortunately, Least Squares fits usually suffer from overfitting: they minimisethe objective function by choosing excessively large values for the variables.

The solution to this problem is called regularisation: we introduce a termwhich penalises for large values of the variables.

Thus, instead, we minimise the sum

S(~υ, ~µ, λ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2 + λ

(∑i

υ2i +

∑j

µ2j

)

where λ is a suitably chosen small positive constant, usually 10−10 ≤ λ ≤ 10−2.

Optimal value of λ can be “learned” in a way to be described later.

We now obtain from table R a new table R by replacing all r∗(j, i) in R withvalues r(j, i) = r∗(j, i)− υi − µj where υ′s and µ′s were obtained by ourregularised least squares fit.

Having removed the systematic biases of users and trendiness of movies, we arenow ready to estimate similarities of users and similarities of movies.

COMP4121 9 / 30

Page 51: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Unfortunately, Least Squares fits usually suffer from overfitting: they minimisethe objective function by choosing excessively large values for the variables.

The solution to this problem is called regularisation: we introduce a termwhich penalises for large values of the variables.

Thus, instead, we minimise the sum

S(~υ, ~µ, λ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2 + λ

(∑i

υ2i +

∑j

µ2j

)

where λ is a suitably chosen small positive constant, usually 10−10 ≤ λ ≤ 10−2.

Optimal value of λ can be “learned” in a way to be described later.

We now obtain from table R a new table R by replacing all r∗(j, i) in R withvalues r(j, i) = r∗(j, i)− υi − µj where υ′s and µ′s were obtained by ourregularised least squares fit.

Having removed the systematic biases of users and trendiness of movies, we arenow ready to estimate similarities of users and similarities of movies.

COMP4121 9 / 30

Page 52: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Unfortunately, Least Squares fits usually suffer from overfitting: they minimisethe objective function by choosing excessively large values for the variables.

The solution to this problem is called regularisation: we introduce a termwhich penalises for large values of the variables.

Thus, instead, we minimise the sum

S(~υ, ~µ, λ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2 + λ

(∑i

υ2i +

∑j

µ2j

)

where λ is a suitably chosen small positive constant, usually 10−10 ≤ λ ≤ 10−2.

Optimal value of λ can be “learned” in a way to be described later.

We now obtain from table R a new table R by replacing all r∗(j, i) in R withvalues r(j, i) = r∗(j, i)− υi − µj where υ′s and µ′s were obtained by ourregularised least squares fit.

Having removed the systematic biases of users and trendiness of movies, we arenow ready to estimate similarities of users and similarities of movies.

COMP4121 9 / 30

Page 53: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method

Unfortunately, Least Squares fits usually suffer from overfitting: they minimisethe objective function by choosing excessively large values for the variables.

The solution to this problem is called regularisation: we introduce a termwhich penalises for large values of the variables.

Thus, instead, we minimise the sum

S(~υ, ~µ, λ) =∑

(j,i)∈R

(r∗(j, i)− υi − µj)2 + λ

(∑i

υ2i +

∑j

µ2j

)

where λ is a suitably chosen small positive constant, usually 10−10 ≤ λ ≤ 10−2.

Optimal value of λ can be “learned” in a way to be described later.

We now obtain from table R a new table R by replacing all r∗(j, i) in R withvalues r(j, i) = r∗(j, i)− υi − µj where υ′s and µ′s were obtained by ourregularised least squares fit.

Having removed the systematic biases of users and trendiness of movies, we arenow ready to estimate similarities of users and similarities of movies.

COMP4121 9 / 30

Page 54: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

One of the most frequently used measure of similarity of users is the cosinesimilarity measure.

Let us first compare two users, Ui and Uk. We find all movies that both usershave ranked and delete all other entries r(j, i) and r(j′, k) in the corresponding

columns of these two users in the partial table R (thus, we remove ratings ofmovies which only one of the two users have seen and all the blank spaces).

In this way we obtain two column vectors ~ui and ~uk such that the coordinatesof vector ~ui are the rankings of user Ui and the coordinates of vector ~uk arerankings of user Uk of all the movies seen by both users.

The similarity of the two users is measured by the cosine of the angle betweenthese two vectors.

Intuitively, these two users have similar tastes if the two vectors point in“similar directions”.

Recall that

cos(ui, uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

where 〈~ui, ~uk〉 =∑p(~ui)p(~uk)p is the scalar product of vectors ~ui and ~uk and

‖~uk‖ =√∑

p(uk)2p is the norm (the “length”) of vector ~uk.

COMP4121 10 / 30

Page 55: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

One of the most frequently used measure of similarity of users is the cosinesimilarity measure.

Let us first compare two users, Ui and Uk. We find all movies that both usershave ranked and delete all other entries r(j, i) and r(j′, k) in the corresponding

columns of these two users in the partial table R (thus, we remove ratings ofmovies which only one of the two users have seen and all the blank spaces).

In this way we obtain two column vectors ~ui and ~uk such that the coordinatesof vector ~ui are the rankings of user Ui and the coordinates of vector ~uk arerankings of user Uk of all the movies seen by both users.

The similarity of the two users is measured by the cosine of the angle betweenthese two vectors.

Intuitively, these two users have similar tastes if the two vectors point in“similar directions”.

Recall that

cos(ui, uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

where 〈~ui, ~uk〉 =∑p(~ui)p(~uk)p is the scalar product of vectors ~ui and ~uk and

‖~uk‖ =√∑

p(uk)2p is the norm (the “length”) of vector ~uk.

COMP4121 10 / 30

Page 56: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

One of the most frequently used measure of similarity of users is the cosinesimilarity measure.

Let us first compare two users, Ui and Uk. We find all movies that both usershave ranked and delete all other entries r(j, i) and r(j′, k) in the corresponding

columns of these two users in the partial table R (thus, we remove ratings ofmovies which only one of the two users have seen and all the blank spaces).

In this way we obtain two column vectors ~ui and ~uk such that the coordinatesof vector ~ui are the rankings of user Ui and the coordinates of vector ~uk arerankings of user Uk of all the movies seen by both users.

The similarity of the two users is measured by the cosine of the angle betweenthese two vectors.

Intuitively, these two users have similar tastes if the two vectors point in“similar directions”.

Recall that

cos(ui, uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

where 〈~ui, ~uk〉 =∑p(~ui)p(~uk)p is the scalar product of vectors ~ui and ~uk and

‖~uk‖ =√∑

p(uk)2p is the norm (the “length”) of vector ~uk.

COMP4121 10 / 30

Page 57: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

One of the most frequently used measure of similarity of users is the cosinesimilarity measure.

Let us first compare two users, Ui and Uk. We find all movies that both usershave ranked and delete all other entries r(j, i) and r(j′, k) in the corresponding

columns of these two users in the partial table R (thus, we remove ratings ofmovies which only one of the two users have seen and all the blank spaces).

In this way we obtain two column vectors ~ui and ~uk such that the coordinatesof vector ~ui are the rankings of user Ui and the coordinates of vector ~uk arerankings of user Uk of all the movies seen by both users.

The similarity of the two users is measured by the cosine of the angle betweenthese two vectors.

Intuitively, these two users have similar tastes if the two vectors point in“similar directions”.

Recall that

cos(ui, uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

where 〈~ui, ~uk〉 =∑p(~ui)p(~uk)p is the scalar product of vectors ~ui and ~uk and

‖~uk‖ =√∑

p(uk)2p is the norm (the “length”) of vector ~uk.

COMP4121 10 / 30

Page 58: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

One of the most frequently used measure of similarity of users is the cosinesimilarity measure.

Let us first compare two users, Ui and Uk. We find all movies that both usershave ranked and delete all other entries r(j, i) and r(j′, k) in the corresponding

columns of these two users in the partial table R (thus, we remove ratings ofmovies which only one of the two users have seen and all the blank spaces).

In this way we obtain two column vectors ~ui and ~uk such that the coordinatesof vector ~ui are the rankings of user Ui and the coordinates of vector ~uk arerankings of user Uk of all the movies seen by both users.

The similarity of the two users is measured by the cosine of the angle betweenthese two vectors.

Intuitively, these two users have similar tastes if the two vectors point in“similar directions”.

Recall that

cos(ui, uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

where 〈~ui, ~uk〉 =∑p(~ui)p(~uk)p is the scalar product of vectors ~ui and ~uk and

‖~uk‖ =√∑

p(uk)2p is the norm (the “length”) of vector ~uk.

COMP4121 10 / 30

Page 59: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

One of the most frequently used measure of similarity of users is the cosinesimilarity measure.

Let us first compare two users, Ui and Uk. We find all movies that both usershave ranked and delete all other entries r(j, i) and r(j′, k) in the corresponding

columns of these two users in the partial table R (thus, we remove ratings ofmovies which only one of the two users have seen and all the blank spaces).

In this way we obtain two column vectors ~ui and ~uk such that the coordinatesof vector ~ui are the rankings of user Ui and the coordinates of vector ~uk arerankings of user Uk of all the movies seen by both users.

The similarity of the two users is measured by the cosine of the angle betweenthese two vectors.

Intuitively, these two users have similar tastes if the two vectors point in“similar directions”.

Recall that

cos(ui, uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

where 〈~ui, ~uk〉 =∑p(~ui)p(~uk)p is the scalar product of vectors ~ui and ~uk and

‖~uk‖ =√∑

p(uk)2p is the norm (the “length”) of vector ~uk.

COMP4121 10 / 30

Page 60: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Thus we define the similarity of users Ui and Uk as

sim(Ui, Uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

To explain why we divide the scalar product 〈~ui, ~uk〉 by the product ‖~ui‖ · ‖~uk‖of the norms of the two vectors, note that these norms are likely to depend onthe dimension of vectors ~ui and ~uk, which in turn depends on the number ofthe movies these two users have both seen.

This is not a good feature; sim(Ui, Uk) should depend only on the “intrinsicsimilarity” of tastes of the two users and thus it should not depend onirrelevant things such as the number of movies they have both seen.

Dividing the scalar product of the two vectors by the product of their normsresults in a quantity depending only the angle between the two vectors, whichmore properly reflects similarity of the tastes of the two users.

Determining the values of sim(Ui, Uk) for every pair of users is a“preprocessing” step which can be updated every few days as new ratings fromusers are received.

We can now predict the rating a user Ui would give to a movie Mj which Uihas not seen as follows.

COMP4121 11 / 30

Page 61: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Thus we define the similarity of users Ui and Uk as

sim(Ui, Uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

To explain why we divide the scalar product 〈~ui, ~uk〉 by the product ‖~ui‖ · ‖~uk‖of the norms of the two vectors, note that these norms are likely to depend onthe dimension of vectors ~ui and ~uk, which in turn depends on the number ofthe movies these two users have both seen.

This is not a good feature; sim(Ui, Uk) should depend only on the “intrinsicsimilarity” of tastes of the two users and thus it should not depend onirrelevant things such as the number of movies they have both seen.

Dividing the scalar product of the two vectors by the product of their normsresults in a quantity depending only the angle between the two vectors, whichmore properly reflects similarity of the tastes of the two users.

Determining the values of sim(Ui, Uk) for every pair of users is a“preprocessing” step which can be updated every few days as new ratings fromusers are received.

We can now predict the rating a user Ui would give to a movie Mj which Uihas not seen as follows.

COMP4121 11 / 30

Page 62: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Thus we define the similarity of users Ui and Uk as

sim(Ui, Uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

To explain why we divide the scalar product 〈~ui, ~uk〉 by the product ‖~ui‖ · ‖~uk‖of the norms of the two vectors, note that these norms are likely to depend onthe dimension of vectors ~ui and ~uk, which in turn depends on the number ofthe movies these two users have both seen.

This is not a good feature; sim(Ui, Uk) should depend only on the “intrinsicsimilarity” of tastes of the two users and thus it should not depend onirrelevant things such as the number of movies they have both seen.

Dividing the scalar product of the two vectors by the product of their normsresults in a quantity depending only the angle between the two vectors, whichmore properly reflects similarity of the tastes of the two users.

Determining the values of sim(Ui, Uk) for every pair of users is a“preprocessing” step which can be updated every few days as new ratings fromusers are received.

We can now predict the rating a user Ui would give to a movie Mj which Uihas not seen as follows.

COMP4121 11 / 30

Page 63: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Thus we define the similarity of users Ui and Uk as

sim(Ui, Uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

To explain why we divide the scalar product 〈~ui, ~uk〉 by the product ‖~ui‖ · ‖~uk‖of the norms of the two vectors, note that these norms are likely to depend onthe dimension of vectors ~ui and ~uk, which in turn depends on the number ofthe movies these two users have both seen.

This is not a good feature; sim(Ui, Uk) should depend only on the “intrinsicsimilarity” of tastes of the two users and thus it should not depend onirrelevant things such as the number of movies they have both seen.

Dividing the scalar product of the two vectors by the product of their normsresults in a quantity depending only the angle between the two vectors, whichmore properly reflects similarity of the tastes of the two users.

Determining the values of sim(Ui, Uk) for every pair of users is a“preprocessing” step which can be updated every few days as new ratings fromusers are received.

We can now predict the rating a user Ui would give to a movie Mj which Uihas not seen as follows.

COMP4121 11 / 30

Page 64: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Thus we define the similarity of users Ui and Uk as

sim(Ui, Uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

To explain why we divide the scalar product 〈~ui, ~uk〉 by the product ‖~ui‖ · ‖~uk‖of the norms of the two vectors, note that these norms are likely to depend onthe dimension of vectors ~ui and ~uk, which in turn depends on the number ofthe movies these two users have both seen.

This is not a good feature; sim(Ui, Uk) should depend only on the “intrinsicsimilarity” of tastes of the two users and thus it should not depend onirrelevant things such as the number of movies they have both seen.

Dividing the scalar product of the two vectors by the product of their normsresults in a quantity depending only the angle between the two vectors, whichmore properly reflects similarity of the tastes of the two users.

Determining the values of sim(Ui, Uk) for every pair of users is a“preprocessing” step which can be updated every few days as new ratings fromusers are received.

We can now predict the rating a user Ui would give to a movie Mj which Uihas not seen as follows.

COMP4121 11 / 30

Page 65: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Thus we define the similarity of users Ui and Uk as

sim(Ui, Uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

To explain why we divide the scalar product 〈~ui, ~uk〉 by the product ‖~ui‖ · ‖~uk‖of the norms of the two vectors, note that these norms are likely to depend onthe dimension of vectors ~ui and ~uk, which in turn depends on the number ofthe movies these two users have both seen.

This is not a good feature; sim(Ui, Uk) should depend only on the “intrinsicsimilarity” of tastes of the two users and thus it should not depend onirrelevant things such as the number of movies they have both seen.

Dividing the scalar product of the two vectors by the product of their normsresults in a quantity depending only the angle between the two vectors, whichmore properly reflects similarity of the tastes of the two users.

Determining the values of sim(Ui, Uk) for every pair of users is a“preprocessing” step which can be updated every few days as new ratings fromusers are received.

We can now predict the rating a user Ui would give to a movie Mj which Uihas not seen as follows.

COMP4121 11 / 30

Page 66: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Thus we define the similarity of users Ui and Uk as

sim(Ui, Uk) =〈~ui, ~uk〉‖~ui‖ · ‖~uk‖

To explain why we divide the scalar product 〈~ui, ~uk〉 by the product ‖~ui‖ · ‖~uk‖of the norms of the two vectors, note that these norms are likely to depend onthe dimension of vectors ~ui and ~uk, which in turn depends on the number ofthe movies these two users have both seen.

This is not a good feature; sim(Ui, Uk) should depend only on the “intrinsicsimilarity” of tastes of the two users and thus it should not depend onirrelevant things such as the number of movies they have both seen.

Dividing the scalar product of the two vectors by the product of their normsresults in a quantity depending only the angle between the two vectors, whichmore properly reflects similarity of the tastes of the two users.

Determining the values of sim(Ui, Uk) for every pair of users is a“preprocessing” step which can be updated every few days as new ratings fromusers are received.

We can now predict the rating a user Ui would give to a movie Mj which Uihas not seen as follows.

COMP4121 11 / 30

Page 67: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Among all users who have seen movie Mj pick L many users Ukl with Llargest values of |sim(Ui, Ukl)|.

Note that we pick not only users Uk which are the most similar (with a largepositive sim(Ui, Uk)) but also also the most dissimilar ones (with negativesim(Ui, Uk)).

We now predict the rating user Ui would give to movie Mj as

pred(j, i) = r + υi + µj +

∑1≤l≤L sim(Ui, Ukl) r(j, kl)∑

1≤l≤L |sim(Ui, Ukl)|

We then recommend to user Ui movie Mj for which the predicted ratingpred(j, i) is the highest.

Note that “the hype factor” µj is brought back into the equation whendeciding what to recommend.

COMP4121 12 / 30

Page 68: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Among all users who have seen movie Mj pick L many users Ukl with Llargest values of |sim(Ui, Ukl)|.Note that we pick not only users Uk which are the most similar (with a largepositive sim(Ui, Uk)) but also also the most dissimilar ones (with negativesim(Ui, Uk)).

We now predict the rating user Ui would give to movie Mj as

pred(j, i) = r + υi + µj +

∑1≤l≤L sim(Ui, Ukl) r(j, kl)∑

1≤l≤L |sim(Ui, Ukl)|

We then recommend to user Ui movie Mj for which the predicted ratingpred(j, i) is the highest.

Note that “the hype factor” µj is brought back into the equation whendeciding what to recommend.

COMP4121 12 / 30

Page 69: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Among all users who have seen movie Mj pick L many users Ukl with Llargest values of |sim(Ui, Ukl)|.Note that we pick not only users Uk which are the most similar (with a largepositive sim(Ui, Uk)) but also also the most dissimilar ones (with negativesim(Ui, Uk)).

We now predict the rating user Ui would give to movie Mj as

pred(j, i) = r + υi + µj +

∑1≤l≤L sim(Ui, Ukl) r(j, kl)∑

1≤l≤L |sim(Ui, Ukl)|

We then recommend to user Ui movie Mj for which the predicted ratingpred(j, i) is the highest.

Note that “the hype factor” µj is brought back into the equation whendeciding what to recommend.

COMP4121 12 / 30

Page 70: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Among all users who have seen movie Mj pick L many users Ukl with Llargest values of |sim(Ui, Ukl)|.Note that we pick not only users Uk which are the most similar (with a largepositive sim(Ui, Uk)) but also also the most dissimilar ones (with negativesim(Ui, Uk)).

We now predict the rating user Ui would give to movie Mj as

pred(j, i) = r + υi + µj +

∑1≤l≤L sim(Ui, Ukl) r(j, kl)∑

1≤l≤L |sim(Ui, Ukl)|

We then recommend to user Ui movie Mj for which the predicted ratingpred(j, i) is the highest.

Note that “the hype factor” µj is brought back into the equation whendeciding what to recommend.

COMP4121 12 / 30

Page 71: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of users

Among all users who have seen movie Mj pick L many users Ukl with Llargest values of |sim(Ui, Ukl)|.Note that we pick not only users Uk which are the most similar (with a largepositive sim(Ui, Uk)) but also also the most dissimilar ones (with negativesim(Ui, Uk)).

We now predict the rating user Ui would give to movie Mj as

pred(j, i) = r + υi + µj +

∑1≤l≤L sim(Ui, Ukl) r(j, kl)∑

1≤l≤L |sim(Ui, Ukl)|

We then recommend to user Ui movie Mj for which the predicted ratingpred(j, i) is the highest.

Note that “the hype factor” µj is brought back into the equation whendeciding what to recommend.

COMP4121 12 / 30

Page 72: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of movies

We can in a similar way estimate similarity of movies, working on columns of

table R (instead of rows).

For any two movies Mj and Mn consider all users which have rated bothmovies and form two vectors ~mj and ~mn with coordinates which are theratings of the form r(j, l) and r(n, l) where l ranges over all users who ratedboth movies.

We can now define the cosine similarity between these two movies as

sim(Mj ,Mn) =〈~mj , ~mn〉‖~mj‖ · ‖~mn‖

If we now want to predict how a user Ui would rank a movie Mj we would pickamong all the movies he has seen L many of them for which |sim(Mj ,Mnl)|are the largest.

We now predict the rating user Ui would give to movie Mj as

pred(j, i) = r + υi + µj +

∑1≤l≤L sim(Mj ,Mnl) r(nl, i)∑

1≤l≤L |∑

1≤l≤L sim(Mj ,Mnl)|

Again, we would recommend the movie Mj with the highest predicted valuepred(j, i).

COMP4121 13 / 30

Page 73: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of movies

We can in a similar way estimate similarity of movies, working on columns of

table R (instead of rows).

For any two movies Mj and Mn consider all users which have rated bothmovies and form two vectors ~mj and ~mn with coordinates which are theratings of the form r(j, l) and r(n, l) where l ranges over all users who ratedboth movies.

We can now define the cosine similarity between these two movies as

sim(Mj ,Mn) =〈~mj , ~mn〉‖~mj‖ · ‖~mn‖

If we now want to predict how a user Ui would rank a movie Mj we would pickamong all the movies he has seen L many of them for which |sim(Mj ,Mnl)|are the largest.

We now predict the rating user Ui would give to movie Mj as

pred(j, i) = r + υi + µj +

∑1≤l≤L sim(Mj ,Mnl) r(nl, i)∑

1≤l≤L |∑

1≤l≤L sim(Mj ,Mnl)|

Again, we would recommend the movie Mj with the highest predicted valuepred(j, i).

COMP4121 13 / 30

Page 74: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of movies

We can in a similar way estimate similarity of movies, working on columns of

table R (instead of rows).

For any two movies Mj and Mn consider all users which have rated bothmovies and form two vectors ~mj and ~mn with coordinates which are theratings of the form r(j, l) and r(n, l) where l ranges over all users who ratedboth movies.

We can now define the cosine similarity between these two movies as

sim(Mj ,Mn) =〈~mj , ~mn〉‖~mj‖ · ‖~mn‖

If we now want to predict how a user Ui would rank a movie Mj we would pickamong all the movies he has seen L many of them for which |sim(Mj ,Mnl)|are the largest.

We now predict the rating user Ui would give to movie Mj as

pred(j, i) = r + υi + µj +

∑1≤l≤L sim(Mj ,Mnl) r(nl, i)∑

1≤l≤L |∑

1≤l≤L sim(Mj ,Mnl)|

Again, we would recommend the movie Mj with the highest predicted valuepred(j, i).

COMP4121 13 / 30

Page 75: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of movies

We can in a similar way estimate similarity of movies, working on columns of

table R (instead of rows).

For any two movies Mj and Mn consider all users which have rated bothmovies and form two vectors ~mj and ~mn with coordinates which are theratings of the form r(j, l) and r(n, l) where l ranges over all users who ratedboth movies.

We can now define the cosine similarity between these two movies as

sim(Mj ,Mn) =〈~mj , ~mn〉‖~mj‖ · ‖~mn‖

If we now want to predict how a user Ui would rank a movie Mj we would pickamong all the movies he has seen L many of them for which |sim(Mj ,Mnl)|are the largest.

We now predict the rating user Ui would give to movie Mj as

pred(j, i) = r + υi + µj +

∑1≤l≤L sim(Mj ,Mnl) r(nl, i)∑

1≤l≤L |∑

1≤l≤L sim(Mj ,Mnl)|

Again, we would recommend the movie Mj with the highest predicted valuepred(j, i).

COMP4121 13 / 30

Page 76: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of movies

We can in a similar way estimate similarity of movies, working on columns of

table R (instead of rows).

For any two movies Mj and Mn consider all users which have rated bothmovies and form two vectors ~mj and ~mn with coordinates which are theratings of the form r(j, l) and r(n, l) where l ranges over all users who ratedboth movies.

We can now define the cosine similarity between these two movies as

sim(Mj ,Mn) =〈~mj , ~mn〉‖~mj‖ · ‖~mn‖

If we now want to predict how a user Ui would rank a movie Mj we would pickamong all the movies he has seen L many of them for which |sim(Mj ,Mnl)|are the largest.

We now predict the rating user Ui would give to movie Mj as

pred(j, i) = r + υi + µj +

∑1≤l≤L sim(Mj ,Mnl) r(nl, i)∑

1≤l≤L |∑

1≤l≤L sim(Mj ,Mnl)|

Again, we would recommend the movie Mj with the highest predicted valuepred(j, i).

COMP4121 13 / 30

Page 77: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Neighbourhood Method - similarity of movies

We can in a similar way estimate similarity of movies, working on columns of

table R (instead of rows).

For any two movies Mj and Mn consider all users which have rated bothmovies and form two vectors ~mj and ~mn with coordinates which are theratings of the form r(j, l) and r(n, l) where l ranges over all users who ratedboth movies.

We can now define the cosine similarity between these two movies as

sim(Mj ,Mn) =〈~mj , ~mn〉‖~mj‖ · ‖~mn‖

If we now want to predict how a user Ui would rank a movie Mj we would pickamong all the movies he has seen L many of them for which |sim(Mj ,Mnl)|are the largest.

We now predict the rating user Ui would give to movie Mj as

pred(j, i) = r + υi + µj +

∑1≤l≤L sim(Mj ,Mnl) r(nl, i)∑

1≤l≤L |∑

1≤l≤L sim(Mj ,Mnl)|

Again, we would recommend the movie Mj with the highest predicted valuepred(j, i).

COMP4121 13 / 30

Page 78: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

A very different commonly used method is the Latent Factor Method.

Heuristics behind the method:

One can argue that there is only a relatively small number (up to afew hundreds) of features a movie might posses to various extentswhich appeal to different tastes and which determine how much aparticular user would like such a movie.Examples of such features are “action movie”, “romantic movie”,“famous actors”, “special effects”, “violence”, “humour”, etc.Let us enumerate all of these features as f1, f2, . . . , fN where N isof the order of a few tens to a few hundreds.A movie can have each of these features, say fi to an extent ei,where ei is, say, between 0 and 10.Thus, to each movie Mj there corresponds a vector ~ej of length Nsuch that its ith coordinate (~ej)i represents the extent to whichmovie Mj has feature fi.We can now form a matrix F such that rows of F correspond tomovies Mj and columns correspond to features fi.

COMP4121 14 / 30

Page 79: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

A very different commonly used method is the Latent Factor Method.

Heuristics behind the method:

One can argue that there is only a relatively small number (up to afew hundreds) of features a movie might posses to various extentswhich appeal to different tastes and which determine how much aparticular user would like such a movie.Examples of such features are “action movie”, “romantic movie”,“famous actors”, “special effects”, “violence”, “humour”, etc.Let us enumerate all of these features as f1, f2, . . . , fN where N isof the order of a few tens to a few hundreds.A movie can have each of these features, say fi to an extent ei,where ei is, say, between 0 and 10.Thus, to each movie Mj there corresponds a vector ~ej of length Nsuch that its ith coordinate (~ej)i represents the extent to whichmovie Mj has feature fi.We can now form a matrix F such that rows of F correspond tomovies Mj and columns correspond to features fi.

COMP4121 14 / 30

Page 80: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

A very different commonly used method is the Latent Factor Method.

Heuristics behind the method:

One can argue that there is only a relatively small number (up to afew hundreds) of features a movie might posses to various extentswhich appeal to different tastes and which determine how much aparticular user would like such a movie.

Examples of such features are “action movie”, “romantic movie”,“famous actors”, “special effects”, “violence”, “humour”, etc.Let us enumerate all of these features as f1, f2, . . . , fN where N isof the order of a few tens to a few hundreds.A movie can have each of these features, say fi to an extent ei,where ei is, say, between 0 and 10.Thus, to each movie Mj there corresponds a vector ~ej of length Nsuch that its ith coordinate (~ej)i represents the extent to whichmovie Mj has feature fi.We can now form a matrix F such that rows of F correspond tomovies Mj and columns correspond to features fi.

COMP4121 14 / 30

Page 81: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

A very different commonly used method is the Latent Factor Method.

Heuristics behind the method:

One can argue that there is only a relatively small number (up to afew hundreds) of features a movie might posses to various extentswhich appeal to different tastes and which determine how much aparticular user would like such a movie.Examples of such features are “action movie”, “romantic movie”,“famous actors”, “special effects”, “violence”, “humour”, etc.

Let us enumerate all of these features as f1, f2, . . . , fN where N isof the order of a few tens to a few hundreds.A movie can have each of these features, say fi to an extent ei,where ei is, say, between 0 and 10.Thus, to each movie Mj there corresponds a vector ~ej of length Nsuch that its ith coordinate (~ej)i represents the extent to whichmovie Mj has feature fi.We can now form a matrix F such that rows of F correspond tomovies Mj and columns correspond to features fi.

COMP4121 14 / 30

Page 82: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

A very different commonly used method is the Latent Factor Method.

Heuristics behind the method:

One can argue that there is only a relatively small number (up to afew hundreds) of features a movie might posses to various extentswhich appeal to different tastes and which determine how much aparticular user would like such a movie.Examples of such features are “action movie”, “romantic movie”,“famous actors”, “special effects”, “violence”, “humour”, etc.Let us enumerate all of these features as f1, f2, . . . , fN where N isof the order of a few tens to a few hundreds.

A movie can have each of these features, say fi to an extent ei,where ei is, say, between 0 and 10.Thus, to each movie Mj there corresponds a vector ~ej of length Nsuch that its ith coordinate (~ej)i represents the extent to whichmovie Mj has feature fi.We can now form a matrix F such that rows of F correspond tomovies Mj and columns correspond to features fi.

COMP4121 14 / 30

Page 83: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

A very different commonly used method is the Latent Factor Method.

Heuristics behind the method:

One can argue that there is only a relatively small number (up to afew hundreds) of features a movie might posses to various extentswhich appeal to different tastes and which determine how much aparticular user would like such a movie.Examples of such features are “action movie”, “romantic movie”,“famous actors”, “special effects”, “violence”, “humour”, etc.Let us enumerate all of these features as f1, f2, . . . , fN where N isof the order of a few tens to a few hundreds.A movie can have each of these features, say fi to an extent ei,where ei is, say, between 0 and 10.

Thus, to each movie Mj there corresponds a vector ~ej of length Nsuch that its ith coordinate (~ej)i represents the extent to whichmovie Mj has feature fi.We can now form a matrix F such that rows of F correspond tomovies Mj and columns correspond to features fi.

COMP4121 14 / 30

Page 84: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

A very different commonly used method is the Latent Factor Method.

Heuristics behind the method:

One can argue that there is only a relatively small number (up to afew hundreds) of features a movie might posses to various extentswhich appeal to different tastes and which determine how much aparticular user would like such a movie.Examples of such features are “action movie”, “romantic movie”,“famous actors”, “special effects”, “violence”, “humour”, etc.Let us enumerate all of these features as f1, f2, . . . , fN where N isof the order of a few tens to a few hundreds.A movie can have each of these features, say fi to an extent ei,where ei is, say, between 0 and 10.Thus, to each movie Mj there corresponds a vector ~ej of length Nsuch that its ith coordinate (~ej)i represents the extent to whichmovie Mj has feature fi.

We can now form a matrix F such that rows of F correspond tomovies Mj and columns correspond to features fi.

COMP4121 14 / 30

Page 85: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

A very different commonly used method is the Latent Factor Method.

Heuristics behind the method:

One can argue that there is only a relatively small number (up to afew hundreds) of features a movie might posses to various extentswhich appeal to different tastes and which determine how much aparticular user would like such a movie.Examples of such features are “action movie”, “romantic movie”,“famous actors”, “special effects”, “violence”, “humour”, etc.Let us enumerate all of these features as f1, f2, . . . , fN where N isof the order of a few tens to a few hundreds.A movie can have each of these features, say fi to an extent ei,where ei is, say, between 0 and 10.Thus, to each movie Mj there corresponds a vector ~ej of length Nsuch that its ith coordinate (~ej)i represents the extent to whichmovie Mj has feature fi.We can now form a matrix F such that rows of F correspond tomovies Mj and columns correspond to features fi.

COMP4121 14 / 30

Page 86: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor MethodThus, if feature f1 is “action movie” and if F (1, 1) = 9 this would mean thatthe first movie on our list has a very significant action component.

. . . . . . . . . . . . . . .

f1 f2 … … … … … … f300 M1 9 1 7 0 … 5 M2 … … …

… … … … … … … … …. ….… M 10000000

tens of thousands of movies

A few hundreds of features

COMP4121 15 / 30

Page 87: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor MethodThus, if feature f1 is “action movie” and if F (1, 1) = 9 this would mean thatthe first movie on our list has a very significant action component.

. . . . . . . . . . . . . . .

f1 f2 … … … … … … f300 M1 9 1 7 0 … 5 M2 … … …

… … … … … … … … …. ….… M 10000000

tens of thousands of movies

A few hundreds of features

COMP4121 15 / 30

Page 88: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

We can also associate with each user Ui a column vector ~li such that its mth

coordinate (~li)m is a number in the same range of, say, 0 to 10, which tells ushow much user Ui likes having feature fm in a movie.

Thus, for example, if feature f1 is “action movie” and if for user U1 the value

of (~l1)1 is 9, this would mean that user U1 likes very much movies with a lot ofaction.

On the other hand, if feature f2 is “romantic” and the value of (~l1)2 is 1, thiswould mean that user U1 does not like very much movies with lots of romance.

We can now form a matrix L whose rows correspond to features and columnscorrespond to users.

If feature fm is “special effects” and entry L(m, i) in mth row and ith columnis, say, 5, this would mean that user Ui is ambivalent towards feature fm: heneither likes nor dislikes movies which have lots of special effects.

COMP4121 16 / 30

Page 89: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

We can also associate with each user Ui a column vector ~li such that its mth

coordinate (~li)m is a number in the same range of, say, 0 to 10, which tells ushow much user Ui likes having feature fm in a movie.

Thus, for example, if feature f1 is “action movie” and if for user U1 the value

of (~l1)1 is 9, this would mean that user U1 likes very much movies with a lot ofaction.

On the other hand, if feature f2 is “romantic” and the value of (~l1)2 is 1, thiswould mean that user U1 does not like very much movies with lots of romance.

We can now form a matrix L whose rows correspond to features and columnscorrespond to users.

If feature fm is “special effects” and entry L(m, i) in mth row and ith columnis, say, 5, this would mean that user Ui is ambivalent towards feature fm: heneither likes nor dislikes movies which have lots of special effects.

COMP4121 16 / 30

Page 90: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

We can also associate with each user Ui a column vector ~li such that its mth

coordinate (~li)m is a number in the same range of, say, 0 to 10, which tells ushow much user Ui likes having feature fm in a movie.

Thus, for example, if feature f1 is “action movie” and if for user U1 the value

of (~l1)1 is 9, this would mean that user U1 likes very much movies with a lot ofaction.

On the other hand, if feature f2 is “romantic” and the value of (~l1)2 is 1, thiswould mean that user U1 does not like very much movies with lots of romance.

We can now form a matrix L whose rows correspond to features and columnscorrespond to users.

If feature fm is “special effects” and entry L(m, i) in mth row and ith columnis, say, 5, this would mean that user Ui is ambivalent towards feature fm: heneither likes nor dislikes movies which have lots of special effects.

COMP4121 16 / 30

Page 91: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

We can also associate with each user Ui a column vector ~li such that its mth

coordinate (~li)m is a number in the same range of, say, 0 to 10, which tells ushow much user Ui likes having feature fm in a movie.

Thus, for example, if feature f1 is “action movie” and if for user U1 the value

of (~l1)1 is 9, this would mean that user U1 likes very much movies with a lot ofaction.

On the other hand, if feature f2 is “romantic” and the value of (~l1)2 is 1, thiswould mean that user U1 does not like very much movies with lots of romance.

We can now form a matrix L whose rows correspond to features and columnscorrespond to users.

If feature fm is “special effects” and entry L(m, i) in mth row and ith columnis, say, 5, this would mean that user Ui is ambivalent towards feature fm: heneither likes nor dislikes movies which have lots of special effects.

COMP4121 16 / 30

Page 92: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

We can also associate with each user Ui a column vector ~li such that its mth

coordinate (~li)m is a number in the same range of, say, 0 to 10, which tells ushow much user Ui likes having feature fm in a movie.

Thus, for example, if feature f1 is “action movie” and if for user U1 the value

of (~l1)1 is 9, this would mean that user U1 likes very much movies with a lot ofaction.

On the other hand, if feature f2 is “romantic” and the value of (~l1)2 is 1, thiswould mean that user U1 does not like very much movies with lots of romance.

We can now form a matrix L whose rows correspond to features and columnscorrespond to users.

If feature fm is “special effects” and entry L(m, i) in mth row and ith columnis, say, 5, this would mean that user Ui is ambivalent towards feature fm: heneither likes nor dislikes movies which have lots of special effects.

COMP4121 16 / 30

Page 93: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

If feature f1 is “action movie” and feature f2 is “romantic movie” and ifL(1, 1) = 9 and L(2, 1) = 1 this would mean that the first user on our list likesmovies with lots of action but does not like movies with lots of romance.

… … … … … … … …

. f1

U1 9

U2 … … … … … … …

f2 1 f3 7 … … fm 5 … … f300

… … … … … … … … …. U 10000000

A few hundreds of features

hundreds of thousands of users

COMP4121 17 / 30

Page 94: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

If feature f1 is “action movie” and feature f2 is “romantic movie” and ifL(1, 1) = 9 and L(2, 1) = 1 this would mean that the first user on our list likesmovies with lots of action but does not like movies with lots of romance.

… … … … … … … …

. f1

U1 9

U2 … … … … … … …

f2 1 f3 7 … … fm 5 … … f300

… … … … … … … … …. U 10000000

A few hundreds of features

hundreds of thousands of users

COMP4121 17 / 30

Page 95: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Assume for a moment that somehow we have access to matrix F whichspecifies for each movie Mj to what degree it has each feature fm and matrixL which specifies for each user Ui how important each feature fm is.

Let us fix a movie Mj and its feature content vector ~ej .

Thus, for every feature fm the coordinate (~ej)m of ~ej specifies how much offeature fm the movie Mi has.

Let us also fix a user Ui and its feature importance vector ~li.

Thus, for every feature fm the coordinate (~li)m of ~li specifies how important isthat a movie has feature fm in order for Ui to like it.

Then for every user Ui and every movie Mj it would be easy to predict howmuch Ui would like Mj by evaluating the expression

E(j, i) =∑

1≤m≤N

(~ej)m (~li)m = 〈~ej ,~li〉.

But note that E(j, i) is precisely the entry of the matrix E = F × L in jth rowand ith column:

COMP4121 18 / 30

Page 96: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Assume for a moment that somehow we have access to matrix F whichspecifies for each movie Mj to what degree it has each feature fm and matrixL which specifies for each user Ui how important each feature fm is.

Let us fix a movie Mj and its feature content vector ~ej .

Thus, for every feature fm the coordinate (~ej)m of ~ej specifies how much offeature fm the movie Mi has.

Let us also fix a user Ui and its feature importance vector ~li.

Thus, for every feature fm the coordinate (~li)m of ~li specifies how important isthat a movie has feature fm in order for Ui to like it.

Then for every user Ui and every movie Mj it would be easy to predict howmuch Ui would like Mj by evaluating the expression

E(j, i) =∑

1≤m≤N

(~ej)m (~li)m = 〈~ej ,~li〉.

But note that E(j, i) is precisely the entry of the matrix E = F × L in jth rowand ith column:

COMP4121 18 / 30

Page 97: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Assume for a moment that somehow we have access to matrix F whichspecifies for each movie Mj to what degree it has each feature fm and matrixL which specifies for each user Ui how important each feature fm is.

Let us fix a movie Mj and its feature content vector ~ej .

Thus, for every feature fm the coordinate (~ej)m of ~ej specifies how much offeature fm the movie Mi has.

Let us also fix a user Ui and its feature importance vector ~li.

Thus, for every feature fm the coordinate (~li)m of ~li specifies how important isthat a movie has feature fm in order for Ui to like it.

Then for every user Ui and every movie Mj it would be easy to predict howmuch Ui would like Mj by evaluating the expression

E(j, i) =∑

1≤m≤N

(~ej)m (~li)m = 〈~ej ,~li〉.

But note that E(j, i) is precisely the entry of the matrix E = F × L in jth rowand ith column:

COMP4121 18 / 30

Page 98: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Assume for a moment that somehow we have access to matrix F whichspecifies for each movie Mj to what degree it has each feature fm and matrixL which specifies for each user Ui how important each feature fm is.

Let us fix a movie Mj and its feature content vector ~ej .

Thus, for every feature fm the coordinate (~ej)m of ~ej specifies how much offeature fm the movie Mi has.

Let us also fix a user Ui and its feature importance vector ~li.

Thus, for every feature fm the coordinate (~li)m of ~li specifies how important isthat a movie has feature fm in order for Ui to like it.

Then for every user Ui and every movie Mj it would be easy to predict howmuch Ui would like Mj by evaluating the expression

E(j, i) =∑

1≤m≤N

(~ej)m (~li)m = 〈~ej ,~li〉.

But note that E(j, i) is precisely the entry of the matrix E = F × L in jth rowand ith column:

COMP4121 18 / 30

Page 99: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Assume for a moment that somehow we have access to matrix F whichspecifies for each movie Mj to what degree it has each feature fm and matrixL which specifies for each user Ui how important each feature fm is.

Let us fix a movie Mj and its feature content vector ~ej .

Thus, for every feature fm the coordinate (~ej)m of ~ej specifies how much offeature fm the movie Mi has.

Let us also fix a user Ui and its feature importance vector ~li.

Thus, for every feature fm the coordinate (~li)m of ~li specifies how important isthat a movie has feature fm in order for Ui to like it.

Then for every user Ui and every movie Mj it would be easy to predict howmuch Ui would like Mj by evaluating the expression

E(j, i) =∑

1≤m≤N

(~ej)m (~li)m = 〈~ej ,~li〉.

But note that E(j, i) is precisely the entry of the matrix E = F × L in jth rowand ith column:

COMP4121 18 / 30

Page 100: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Assume for a moment that somehow we have access to matrix F whichspecifies for each movie Mj to what degree it has each feature fm and matrixL which specifies for each user Ui how important each feature fm is.

Let us fix a movie Mj and its feature content vector ~ej .

Thus, for every feature fm the coordinate (~ej)m of ~ej specifies how much offeature fm the movie Mi has.

Let us also fix a user Ui and its feature importance vector ~li.

Thus, for every feature fm the coordinate (~li)m of ~li specifies how important isthat a movie has feature fm in order for Ui to like it.

Then for every user Ui and every movie Mj it would be easy to predict howmuch Ui would like Mj by evaluating the expression

E(j, i) =∑

1≤m≤N

(~ej)m (~li)m = 〈~ej ,~li〉.

But note that E(j, i) is precisely the entry of the matrix E = F × L in jth rowand ith column:

COMP4121 18 / 30

Page 101: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Assume for a moment that somehow we have access to matrix F whichspecifies for each movie Mj to what degree it has each feature fm and matrixL which specifies for each user Ui how important each feature fm is.

Let us fix a movie Mj and its feature content vector ~ej .

Thus, for every feature fm the coordinate (~ej)m of ~ej specifies how much offeature fm the movie Mi has.

Let us also fix a user Ui and its feature importance vector ~li.

Thus, for every feature fm the coordinate (~li)m of ~li specifies how important isthat a movie has feature fm in order for Ui to like it.

Then for every user Ui and every movie Mj it would be easy to predict howmuch Ui would like Mj by evaluating the expression

E(j, i) =∑

1≤m≤N

(~ej)m (~li)m = 〈~ej ,~li〉.

But note that E(j, i) is precisely the entry of the matrix E = F × L in jth rowand ith column:

COMP4121 18 / 30

Page 102: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

A few hundreds of features

tens of thousands of movies

hundreds of thousands of users

A few hundreds of features

F

L

E = F X L =

tens of thousands of movies

hundreds of thousands of users

Mj

Ui

Ui

Mj E(j,i)=

COMP4121 19 / 30

Page 103: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

However, there is a very serious problem with such an approach to predictionhow much would a user Ui like a movie Mj .

How can we determine what are the relevant few dozens to few hundreds offeatures needed to describe a movie exhaustively?

Who would assess each movie objectively according to how much of eachfeature such amovie has?

Even worse, how would we determine objectively how much each feature isimportant to each user?

Solution: all of these should be “learned” from the partial table of theexisting ratings of movies!

We even do not need to know what the features are or what they mean.

These features should also “emerge” from the partial table of user ratings R!

COMP4121 20 / 30

Page 104: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

However, there is a very serious problem with such an approach to predictionhow much would a user Ui like a movie Mj .

How can we determine what are the relevant few dozens to few hundreds offeatures needed to describe a movie exhaustively?

Who would assess each movie objectively according to how much of eachfeature such amovie has?

Even worse, how would we determine objectively how much each feature isimportant to each user?

Solution: all of these should be “learned” from the partial table of theexisting ratings of movies!

We even do not need to know what the features are or what they mean.

These features should also “emerge” from the partial table of user ratings R!

COMP4121 20 / 30

Page 105: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

However, there is a very serious problem with such an approach to predictionhow much would a user Ui like a movie Mj .

How can we determine what are the relevant few dozens to few hundreds offeatures needed to describe a movie exhaustively?

Who would assess each movie objectively according to how much of eachfeature such amovie has?

Even worse, how would we determine objectively how much each feature isimportant to each user?

Solution: all of these should be “learned” from the partial table of theexisting ratings of movies!

We even do not need to know what the features are or what they mean.

These features should also “emerge” from the partial table of user ratings R!

COMP4121 20 / 30

Page 106: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

However, there is a very serious problem with such an approach to predictionhow much would a user Ui like a movie Mj .

How can we determine what are the relevant few dozens to few hundreds offeatures needed to describe a movie exhaustively?

Who would assess each movie objectively according to how much of eachfeature such amovie has?

Even worse, how would we determine objectively how much each feature isimportant to each user?

Solution: all of these should be “learned” from the partial table of theexisting ratings of movies!

We even do not need to know what the features are or what they mean.

These features should also “emerge” from the partial table of user ratings R!

COMP4121 20 / 30

Page 107: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

However, there is a very serious problem with such an approach to predictionhow much would a user Ui like a movie Mj .

How can we determine what are the relevant few dozens to few hundreds offeatures needed to describe a movie exhaustively?

Who would assess each movie objectively according to how much of eachfeature such amovie has?

Even worse, how would we determine objectively how much each feature isimportant to each user?

Solution: all of these should be “learned” from the partial table of theexisting ratings of movies!

We even do not need to know what the features are or what they mean.

These features should also “emerge” from the partial table of user ratings R!

COMP4121 20 / 30

Page 108: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

However, there is a very serious problem with such an approach to predictionhow much would a user Ui like a movie Mj .

How can we determine what are the relevant few dozens to few hundreds offeatures needed to describe a movie exhaustively?

Who would assess each movie objectively according to how much of eachfeature such amovie has?

Even worse, how would we determine objectively how much each feature isimportant to each user?

Solution: all of these should be “learned” from the partial table of theexisting ratings of movies!

We even do not need to know what the features are or what they mean.

These features should also “emerge” from the partial table of user ratings R!

COMP4121 20 / 30

Page 109: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

However, there is a very serious problem with such an approach to predictionhow much would a user Ui like a movie Mj .

How can we determine what are the relevant few dozens to few hundreds offeatures needed to describe a movie exhaustively?

Who would assess each movie objectively according to how much of eachfeature such amovie has?

Even worse, how would we determine objectively how much each feature isimportant to each user?

Solution: all of these should be “learned” from the partial table of theexisting ratings of movies!

We even do not need to know what the features are or what they mean.

These features should also “emerge” from the partial table of user ratings R!

COMP4121 20 / 30

Page 110: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Let N be the number of “features” we want to let emerge (with no meaningassigned whatsoever). In applications N ranges between 20 and up to 200.

Let #M be the number of movies in the database and #U be the number ofusers.

Idea: Fill matrices F of size #M ×N and L of size N ×#U with variablesF (j,m) and L(m, i) whose values yet have to be determined.

Solve the following least squares problem in the variables{F (j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N} ∪ {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} :

minimize

S(~F , ~L) =∑

(j,i):R(j,i)exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

2

Note that the total number of variables is (#M + #U)×N .

So N should be chosen so that (#M + #U)×N is a fraction of the totalnumber of existing entries in the partially filled table R of user’s ratings.

Note that if we manage to find F (j,m)’s and L(m, i)’s which “optimallymodel” data, we have no way of figuring out what are the “features” thesenumbers are representing; they simply “emerged” from the data.

COMP4121 21 / 30

Page 111: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Let N be the number of “features” we want to let emerge (with no meaningassigned whatsoever). In applications N ranges between 20 and up to 200.

Let #M be the number of movies in the database and #U be the number ofusers.

Idea: Fill matrices F of size #M ×N and L of size N ×#U with variablesF (j,m) and L(m, i) whose values yet have to be determined.

Solve the following least squares problem in the variables{F (j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N} ∪ {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} :

minimize

S(~F , ~L) =∑

(j,i):R(j,i)exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

2

Note that the total number of variables is (#M + #U)×N .

So N should be chosen so that (#M + #U)×N is a fraction of the totalnumber of existing entries in the partially filled table R of user’s ratings.

Note that if we manage to find F (j,m)’s and L(m, i)’s which “optimallymodel” data, we have no way of figuring out what are the “features” thesenumbers are representing; they simply “emerged” from the data.

COMP4121 21 / 30

Page 112: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Let N be the number of “features” we want to let emerge (with no meaningassigned whatsoever). In applications N ranges between 20 and up to 200.

Let #M be the number of movies in the database and #U be the number ofusers.

Idea: Fill matrices F of size #M ×N and L of size N ×#U with variablesF (j,m) and L(m, i) whose values yet have to be determined.

Solve the following least squares problem in the variables{F (j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N} ∪ {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} :

minimize

S(~F , ~L) =∑

(j,i):R(j,i)exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

2

Note that the total number of variables is (#M + #U)×N .

So N should be chosen so that (#M + #U)×N is a fraction of the totalnumber of existing entries in the partially filled table R of user’s ratings.

Note that if we manage to find F (j,m)’s and L(m, i)’s which “optimallymodel” data, we have no way of figuring out what are the “features” thesenumbers are representing; they simply “emerged” from the data.

COMP4121 21 / 30

Page 113: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Let N be the number of “features” we want to let emerge (with no meaningassigned whatsoever). In applications N ranges between 20 and up to 200.

Let #M be the number of movies in the database and #U be the number ofusers.

Idea: Fill matrices F of size #M ×N and L of size N ×#U with variablesF (j,m) and L(m, i) whose values yet have to be determined.

Solve the following least squares problem in the variables{F (j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N} ∪ {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} :

minimize

S(~F , ~L) =∑

(j,i):R(j,i)exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

2

Note that the total number of variables is (#M + #U)×N .

So N should be chosen so that (#M + #U)×N is a fraction of the totalnumber of existing entries in the partially filled table R of user’s ratings.

Note that if we manage to find F (j,m)’s and L(m, i)’s which “optimallymodel” data, we have no way of figuring out what are the “features” thesenumbers are representing; they simply “emerged” from the data.

COMP4121 21 / 30

Page 114: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Let N be the number of “features” we want to let emerge (with no meaningassigned whatsoever). In applications N ranges between 20 and up to 200.

Let #M be the number of movies in the database and #U be the number ofusers.

Idea: Fill matrices F of size #M ×N and L of size N ×#U with variablesF (j,m) and L(m, i) whose values yet have to be determined.

Solve the following least squares problem in the variables{F (j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N} ∪ {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} :

minimize

S(~F , ~L) =∑

(j,i):R(j,i)exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

2

Note that the total number of variables is (#M + #U)×N .

So N should be chosen so that (#M + #U)×N is a fraction of the totalnumber of existing entries in the partially filled table R of user’s ratings.

Note that if we manage to find F (j,m)’s and L(m, i)’s which “optimallymodel” data, we have no way of figuring out what are the “features” thesenumbers are representing; they simply “emerged” from the data.

COMP4121 21 / 30

Page 115: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Let N be the number of “features” we want to let emerge (with no meaningassigned whatsoever). In applications N ranges between 20 and up to 200.

Let #M be the number of movies in the database and #U be the number ofusers.

Idea: Fill matrices F of size #M ×N and L of size N ×#U with variablesF (j,m) and L(m, i) whose values yet have to be determined.

Solve the following least squares problem in the variables{F (j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N} ∪ {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} :

minimize

S(~F , ~L) =∑

(j,i):R(j,i)exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

2

Note that the total number of variables is (#M + #U)×N .

So N should be chosen so that (#M + #U)×N is a fraction of the totalnumber of existing entries in the partially filled table R of user’s ratings.

Note that if we manage to find F (j,m)’s and L(m, i)’s which “optimallymodel” data, we have no way of figuring out what are the “features” thesenumbers are representing; they simply “emerged” from the data.

COMP4121 21 / 30

Page 116: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Let N be the number of “features” we want to let emerge (with no meaningassigned whatsoever). In applications N ranges between 20 and up to 200.

Let #M be the number of movies in the database and #U be the number ofusers.

Idea: Fill matrices F of size #M ×N and L of size N ×#U with variablesF (j,m) and L(m, i) whose values yet have to be determined.

Solve the following least squares problem in the variables{F (j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N} ∪ {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} :

minimize

S(~F , ~L) =∑

(j,i):R(j,i)exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

2

Note that the total number of variables is (#M + #U)×N .

So N should be chosen so that (#M + #U)×N is a fraction of the totalnumber of existing entries in the partially filled table R of user’s ratings.

Note that if we manage to find F (j,m)’s and L(m, i)’s which “optimallymodel” data, we have no way of figuring out what are the “features” thesenumbers are representing; they simply “emerged” from the data.

COMP4121 21 / 30

Page 117: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

However, there is a serious problem with this approach: setting the partial

derivatives of the objective S(~F , ~L) with respect to all variables to zero resultsin the following system of equations:

∂F (j,m)S(~F , ~L)

=∂

∂F (j,m)

∑(j,i):R(j,i) exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

2

= 2∑

i:R(j,i) exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

L(m, i) = 0;

∂L(m, i)S(~F , ~L)

= 2∑

j:R(j,i) exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

F (j,m) = 0.

COMP4121 22 / 30

Page 118: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

However, there is a serious problem with this approach: setting the partial

derivatives of the objective S(~F , ~L) with respect to all variables to zero resultsin the following system of equations:

∂F (j,m)S(~F , ~L)

=∂

∂F (j,m)

∑(j,i):R(j,i) exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

2

= 2∑

i:R(j,i) exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

L(m, i) = 0;

∂L(m, i)S(~F , ~L)

= 2∑

j:R(j,i) exists

∑1≤m≤N

F (j,m) · L(m, i)−R(j, i)

F (j,m) = 0.

COMP4121 22 / 30

Page 119: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor MethodThis is a huge system of cubic equations and cannot be solved feasibly.

Worse, such an optimisation problem is even not convex, so search for theoptimal solution can end up in a local minimum.We apply an iterative method to find an approximate solution.Note that we apply such a method to “raw data” - no de-biasing like the onewe performed in the Neighbourhood Method.

Steps:

We initially set all variables F (j,m) to the same value F (0)(j,m),say a median value 5.we now solve the following Least Squares problem in variables{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (0)(j,m) · L(m, i)−R(j, i)

2

Note that, since F (0)(j,m) are concrete numbers rather thanvariables, such a Least Squares problem does reduce to a system oflinear equations after we find the partials and set them to zero.This Least Squares can also be regularised just as previously.

COMP4121 23 / 30

Page 120: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor MethodThis is a huge system of cubic equations and cannot be solved feasibly.Worse, such an optimisation problem is even not convex, so search for theoptimal solution can end up in a local minimum.

We apply an iterative method to find an approximate solution.Note that we apply such a method to “raw data” - no de-biasing like the onewe performed in the Neighbourhood Method.

Steps:

We initially set all variables F (j,m) to the same value F (0)(j,m),say a median value 5.we now solve the following Least Squares problem in variables{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (0)(j,m) · L(m, i)−R(j, i)

2

Note that, since F (0)(j,m) are concrete numbers rather thanvariables, such a Least Squares problem does reduce to a system oflinear equations after we find the partials and set them to zero.This Least Squares can also be regularised just as previously.

COMP4121 23 / 30

Page 121: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor MethodThis is a huge system of cubic equations and cannot be solved feasibly.Worse, such an optimisation problem is even not convex, so search for theoptimal solution can end up in a local minimum.We apply an iterative method to find an approximate solution.

Note that we apply such a method to “raw data” - no de-biasing like the onewe performed in the Neighbourhood Method.

Steps:

We initially set all variables F (j,m) to the same value F (0)(j,m),say a median value 5.we now solve the following Least Squares problem in variables{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (0)(j,m) · L(m, i)−R(j, i)

2

Note that, since F (0)(j,m) are concrete numbers rather thanvariables, such a Least Squares problem does reduce to a system oflinear equations after we find the partials and set them to zero.This Least Squares can also be regularised just as previously.

COMP4121 23 / 30

Page 122: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor MethodThis is a huge system of cubic equations and cannot be solved feasibly.Worse, such an optimisation problem is even not convex, so search for theoptimal solution can end up in a local minimum.We apply an iterative method to find an approximate solution.Note that we apply such a method to “raw data” - no de-biasing like the onewe performed in the Neighbourhood Method.

Steps:

We initially set all variables F (j,m) to the same value F (0)(j,m),say a median value 5.we now solve the following Least Squares problem in variables{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (0)(j,m) · L(m, i)−R(j, i)

2

Note that, since F (0)(j,m) are concrete numbers rather thanvariables, such a Least Squares problem does reduce to a system oflinear equations after we find the partials and set them to zero.This Least Squares can also be regularised just as previously.

COMP4121 23 / 30

Page 123: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor MethodThis is a huge system of cubic equations and cannot be solved feasibly.Worse, such an optimisation problem is even not convex, so search for theoptimal solution can end up in a local minimum.We apply an iterative method to find an approximate solution.Note that we apply such a method to “raw data” - no de-biasing like the onewe performed in the Neighbourhood Method.

Steps:

We initially set all variables F (j,m) to the same value F (0)(j,m),say a median value 5.we now solve the following Least Squares problem in variables{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (0)(j,m) · L(m, i)−R(j, i)

2

Note that, since F (0)(j,m) are concrete numbers rather thanvariables, such a Least Squares problem does reduce to a system oflinear equations after we find the partials and set them to zero.This Least Squares can also be regularised just as previously.

COMP4121 23 / 30

Page 124: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor MethodThis is a huge system of cubic equations and cannot be solved feasibly.Worse, such an optimisation problem is even not convex, so search for theoptimal solution can end up in a local minimum.We apply an iterative method to find an approximate solution.Note that we apply such a method to “raw data” - no de-biasing like the onewe performed in the Neighbourhood Method.

Steps:

We initially set all variables F (j,m) to the same value F (0)(j,m),say a median value 5.

we now solve the following Least Squares problem in variables{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (0)(j,m) · L(m, i)−R(j, i)

2

Note that, since F (0)(j,m) are concrete numbers rather thanvariables, such a Least Squares problem does reduce to a system oflinear equations after we find the partials and set them to zero.This Least Squares can also be regularised just as previously.

COMP4121 23 / 30

Page 125: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor MethodThis is a huge system of cubic equations and cannot be solved feasibly.Worse, such an optimisation problem is even not convex, so search for theoptimal solution can end up in a local minimum.We apply an iterative method to find an approximate solution.Note that we apply such a method to “raw data” - no de-biasing like the onewe performed in the Neighbourhood Method.

Steps:

We initially set all variables F (j,m) to the same value F (0)(j,m),say a median value 5.we now solve the following Least Squares problem in variables{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (0)(j,m) · L(m, i)−R(j, i)

2

Note that, since F (0)(j,m) are concrete numbers rather thanvariables, such a Least Squares problem does reduce to a system oflinear equations after we find the partials and set them to zero.This Least Squares can also be regularised just as previously.

COMP4121 23 / 30

Page 126: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor MethodThis is a huge system of cubic equations and cannot be solved feasibly.Worse, such an optimisation problem is even not convex, so search for theoptimal solution can end up in a local minimum.We apply an iterative method to find an approximate solution.Note that we apply such a method to “raw data” - no de-biasing like the onewe performed in the Neighbourhood Method.

Steps:

We initially set all variables F (j,m) to the same value F (0)(j,m),say a median value 5.we now solve the following Least Squares problem in variables{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (0)(j,m) · L(m, i)−R(j, i)

2

Note that, since F (0)(j,m) are concrete numbers rather thanvariables, such a Least Squares problem does reduce to a system oflinear equations after we find the partials and set them to zero.

This Least Squares can also be regularised just as previously.

COMP4121 23 / 30

Page 127: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor MethodThis is a huge system of cubic equations and cannot be solved feasibly.Worse, such an optimisation problem is even not convex, so search for theoptimal solution can end up in a local minimum.We apply an iterative method to find an approximate solution.Note that we apply such a method to “raw data” - no de-biasing like the onewe performed in the Neighbourhood Method.

Steps:

We initially set all variables F (j,m) to the same value F (0)(j,m),say a median value 5.we now solve the following Least Squares problem in variables{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (0)(j,m) · L(m, i)−R(j, i)

2

Note that, since F (0)(j,m) are concrete numbers rather thanvariables, such a Least Squares problem does reduce to a system oflinear equations after we find the partials and set them to zero.This Least Squares can also be regularised just as previously.

COMP4121 23 / 30

Page 128: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

Let L(0)(m, i) be the solutions to such a Least Squares problem.We now solve the following Least Squares problem in variables{F (j,m) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (j,m) · L(0)(m, i)−R(j, i)

2

Note that, since G(0)(m, i) are concrete numbers (obtained as thesolutions of the previous Least Squares problem) rather thanvariables, such a Least Squares problem again reduces to a systemof linear equations after we find the partials and set them to zero.Let F (1)(j,m) be the solutions to such a Least Squares problem; wenow use these values to solve the following Least Squares problemin variables {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

COMP4121 24 / 30

Page 129: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

Let L(0)(m, i) be the solutions to such a Least Squares problem.

We now solve the following Least Squares problem in variables{F (j,m) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (j,m) · L(0)(m, i)−R(j, i)

2

Note that, since G(0)(m, i) are concrete numbers (obtained as thesolutions of the previous Least Squares problem) rather thanvariables, such a Least Squares problem again reduces to a systemof linear equations after we find the partials and set them to zero.Let F (1)(j,m) be the solutions to such a Least Squares problem; wenow use these values to solve the following Least Squares problemin variables {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

COMP4121 24 / 30

Page 130: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

Let L(0)(m, i) be the solutions to such a Least Squares problem.We now solve the following Least Squares problem in variables{F (j,m) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (j,m) · L(0)(m, i)−R(j, i)

2

Note that, since G(0)(m, i) are concrete numbers (obtained as thesolutions of the previous Least Squares problem) rather thanvariables, such a Least Squares problem again reduces to a systemof linear equations after we find the partials and set them to zero.Let F (1)(j,m) be the solutions to such a Least Squares problem; wenow use these values to solve the following Least Squares problemin variables {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

COMP4121 24 / 30

Page 131: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

Let L(0)(m, i) be the solutions to such a Least Squares problem.We now solve the following Least Squares problem in variables{F (j,m) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (j,m) · L(0)(m, i)−R(j, i)

2

Note that, since G(0)(m, i) are concrete numbers (obtained as thesolutions of the previous Least Squares problem) rather thanvariables, such a Least Squares problem again reduces to a systemof linear equations after we find the partials and set them to zero.

Let F (1)(j,m) be the solutions to such a Least Squares problem; wenow use these values to solve the following Least Squares problemin variables {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

COMP4121 24 / 30

Page 132: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

Let L(0)(m, i) be the solutions to such a Least Squares problem.We now solve the following Least Squares problem in variables{F (j,m) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (j,m) · L(0)(m, i)−R(j, i)

2

Note that, since G(0)(m, i) are concrete numbers (obtained as thesolutions of the previous Least Squares problem) rather thanvariables, such a Least Squares problem again reduces to a systemof linear equations after we find the partials and set them to zero.Let F (1)(j,m) be the solutions to such a Least Squares problem; wenow use these values to solve the following Least Squares problemin variables {L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} only:

COMP4121 24 / 30

Page 133: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (1)(j,m) · L(m, i)−R(j, i)

2

We keep alternating between taking either{F (j,m) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} or{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} as free variables, fixing thevalues of the other set from the previously obtained solution to thecorresponding Least Squares problem.This method is sometimes called “Method of AlternatingProjections”.We stop such iterations when∑(j,m)

(F (k)(j,m)− F (k−1)(j,m))2 +∑(i,m)

(L(k)(m, i)− L(k−1)(m, i))2

becomes smaller than an accuracy threshold ε > 0.

COMP4121 25 / 30

Page 134: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (1)(j,m) · L(m, i)−R(j, i)

2

We keep alternating between taking either{F (j,m) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} or{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} as free variables, fixing thevalues of the other set from the previously obtained solution to thecorresponding Least Squares problem.

This method is sometimes called “Method of AlternatingProjections”.We stop such iterations when∑(j,m)

(F (k)(j,m)− F (k−1)(j,m))2 +∑(i,m)

(L(k)(m, i)− L(k−1)(m, i))2

becomes smaller than an accuracy threshold ε > 0.

COMP4121 25 / 30

Page 135: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (1)(j,m) · L(m, i)−R(j, i)

2

We keep alternating between taking either{F (j,m) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} or{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} as free variables, fixing thevalues of the other set from the previously obtained solution to thecorresponding Least Squares problem.This method is sometimes called “Method of AlternatingProjections”.

We stop such iterations when∑(j,m)

(F (k)(j,m)− F (k−1)(j,m))2 +∑(i,m)

(L(k)(m, i)− L(k−1)(m, i))2

becomes smaller than an accuracy threshold ε > 0.

COMP4121 25 / 30

Page 136: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

minimize ∑(j,i):R(j,i) exists

∑1≤m≤N

F (1)(j,m) · L(m, i)−R(j, i)

2

We keep alternating between taking either{F (j,m) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} or{L(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U} as free variables, fixing thevalues of the other set from the previously obtained solution to thecorresponding Least Squares problem.This method is sometimes called “Method of AlternatingProjections”.We stop such iterations when∑(j,m)

(F (k)(j,m)− F (k−1)(j,m))2 +∑(i,m)

(L(k)(m, i)− L(k−1)(m, i))2

becomes smaller than an accuracy threshold ε > 0.

COMP4121 25 / 30

Page 137: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

After we obtain the values F (k)(j,m) and L(k)(m, i) from the lastiteration k, we form the corresponding matrices F of size #M ×Nand L of size N ×#U as

F =(F (k)(j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N

);

L =(L(k)(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U ;

).

We finally set E = F × L as the final matrix of predicted ratings ofall movies by all users, where E(j, i) is the prediction of the ratingof movie Mj by user Ui.

Each of N “features” fm of movies Mj which F (j,m) is supposed to “measure”in a movie Mj is a “latent factor” which we have no way of describing.

Some computer scientists find this troubling, but the recommender systemsbased on the Latent Factor Method perform remarkably well in many domains.

Most likely this is because they are able to leverage the “global information”,based on the relationship of ALL ratings, more effectively than theNeighbourhood Methods which use ratings in a more “localised way”.

COMP4121 26 / 30

Page 138: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

After we obtain the values F (k)(j,m) and L(k)(m, i) from the lastiteration k, we form the corresponding matrices F of size #M ×Nand L of size N ×#U as

F =(F (k)(j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N

);

L =(L(k)(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U ;

).

We finally set E = F × L as the final matrix of predicted ratings ofall movies by all users, where E(j, i) is the prediction of the ratingof movie Mj by user Ui.

Each of N “features” fm of movies Mj which F (j,m) is supposed to “measure”in a movie Mj is a “latent factor” which we have no way of describing.

Some computer scientists find this troubling, but the recommender systemsbased on the Latent Factor Method perform remarkably well in many domains.

Most likely this is because they are able to leverage the “global information”,based on the relationship of ALL ratings, more effectively than theNeighbourhood Methods which use ratings in a more “localised way”.

COMP4121 26 / 30

Page 139: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

After we obtain the values F (k)(j,m) and L(k)(m, i) from the lastiteration k, we form the corresponding matrices F of size #M ×Nand L of size N ×#U as

F =(F (k)(j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N

);

L =(L(k)(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U ;

).

We finally set E = F × L as the final matrix of predicted ratings ofall movies by all users, where E(j, i) is the prediction of the ratingof movie Mj by user Ui.

Each of N “features” fm of movies Mj which F (j,m) is supposed to “measure”in a movie Mj is a “latent factor” which we have no way of describing.

Some computer scientists find this troubling, but the recommender systemsbased on the Latent Factor Method perform remarkably well in many domains.

Most likely this is because they are able to leverage the “global information”,based on the relationship of ALL ratings, more effectively than theNeighbourhood Methods which use ratings in a more “localised way”.

COMP4121 26 / 30

Page 140: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

After we obtain the values F (k)(j,m) and L(k)(m, i) from the lastiteration k, we form the corresponding matrices F of size #M ×Nand L of size N ×#U as

F =(F (k)(j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N

);

L =(L(k)(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U ;

).

We finally set E = F × L as the final matrix of predicted ratings ofall movies by all users, where E(j, i) is the prediction of the ratingof movie Mj by user Ui.

Each of N “features” fm of movies Mj which F (j,m) is supposed to “measure”in a movie Mj is a “latent factor” which we have no way of describing.

Some computer scientists find this troubling, but the recommender systemsbased on the Latent Factor Method perform remarkably well in many domains.

Most likely this is because they are able to leverage the “global information”,based on the relationship of ALL ratings, more effectively than theNeighbourhood Methods which use ratings in a more “localised way”.

COMP4121 26 / 30

Page 141: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

After we obtain the values F (k)(j,m) and L(k)(m, i) from the lastiteration k, we form the corresponding matrices F of size #M ×Nand L of size N ×#U as

F =(F (k)(j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N

);

L =(L(k)(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U ;

).

We finally set E = F × L as the final matrix of predicted ratings ofall movies by all users, where E(j, i) is the prediction of the ratingof movie Mj by user Ui.

Each of N “features” fm of movies Mj which F (j,m) is supposed to “measure”in a movie Mj is a “latent factor” which we have no way of describing.

Some computer scientists find this troubling, but the recommender systemsbased on the Latent Factor Method perform remarkably well in many domains.

Most likely this is because they are able to leverage the “global information”,based on the relationship of ALL ratings, more effectively than theNeighbourhood Methods which use ratings in a more “localised way”.

COMP4121 26 / 30

Page 142: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Latent Factor Method

Steps (continued):

After we obtain the values F (k)(j,m) and L(k)(m, i) from the lastiteration k, we form the corresponding matrices F of size #M ×Nand L of size N ×#U as

F =(F (k)(j,m) : 1 ≤ j ≤ #M ; 1 ≤ m ≤ N

);

L =(L(k)(m, i) : 1 ≤ m ≤ N ; 1 ≤ i ≤ #U ;

).

We finally set E = F × L as the final matrix of predicted ratings ofall movies by all users, where E(j, i) is the prediction of the ratingof movie Mj by user Ui.

Each of N “features” fm of movies Mj which F (j,m) is supposed to “measure”in a movie Mj is a “latent factor” which we have no way of describing.

Some computer scientists find this troubling, but the recommender systemsbased on the Latent Factor Method perform remarkably well in many domains.

Most likely this is because they are able to leverage the “global information”,based on the relationship of ALL ratings, more effectively than theNeighbourhood Methods which use ratings in a more “localised way”.

COMP4121 26 / 30

Page 143: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

So we presented two kinds of recommender systems:

the Neighbourhood Method (in two flavours, one based on thesimilarity of users and another based on similarity of movies)the Latent Factor Method which can be deployed with differentnumber N of “latent factors” (in applications usually between 20and 200)

So how do we decide which one we should use in a particular application?

How can we evaluate how effective a particular choice of a recommendersystem is?

Idea: We use real existing data. As an example we use the Netflix Challengecompetition.

Netflix provided approximately 100 million actual ratings of 480, 000 users,rating 17, 770 movies.

The competition was to stay open till a submission was able to beat theNetflix’s own recommender system Cinematch by more than 10% and then allthe competitors had 30 days to submit an algorithm which was the final entry.

The team with the best performing algorithm would get a prize of 1 million USdollars.

COMP4121 27 / 30

Page 144: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

So we presented two kinds of recommender systems:

the Neighbourhood Method (in two flavours, one based on thesimilarity of users and another based on similarity of movies)

the Latent Factor Method which can be deployed with differentnumber N of “latent factors” (in applications usually between 20and 200)

So how do we decide which one we should use in a particular application?

How can we evaluate how effective a particular choice of a recommendersystem is?

Idea: We use real existing data. As an example we use the Netflix Challengecompetition.

Netflix provided approximately 100 million actual ratings of 480, 000 users,rating 17, 770 movies.

The competition was to stay open till a submission was able to beat theNetflix’s own recommender system Cinematch by more than 10% and then allthe competitors had 30 days to submit an algorithm which was the final entry.

The team with the best performing algorithm would get a prize of 1 million USdollars.

COMP4121 27 / 30

Page 145: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

So we presented two kinds of recommender systems:

the Neighbourhood Method (in two flavours, one based on thesimilarity of users and another based on similarity of movies)the Latent Factor Method which can be deployed with differentnumber N of “latent factors” (in applications usually between 20and 200)

So how do we decide which one we should use in a particular application?

How can we evaluate how effective a particular choice of a recommendersystem is?

Idea: We use real existing data. As an example we use the Netflix Challengecompetition.

Netflix provided approximately 100 million actual ratings of 480, 000 users,rating 17, 770 movies.

The competition was to stay open till a submission was able to beat theNetflix’s own recommender system Cinematch by more than 10% and then allthe competitors had 30 days to submit an algorithm which was the final entry.

The team with the best performing algorithm would get a prize of 1 million USdollars.

COMP4121 27 / 30

Page 146: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

So we presented two kinds of recommender systems:

the Neighbourhood Method (in two flavours, one based on thesimilarity of users and another based on similarity of movies)the Latent Factor Method which can be deployed with differentnumber N of “latent factors” (in applications usually between 20and 200)

So how do we decide which one we should use in a particular application?

How can we evaluate how effective a particular choice of a recommendersystem is?

Idea: We use real existing data. As an example we use the Netflix Challengecompetition.

Netflix provided approximately 100 million actual ratings of 480, 000 users,rating 17, 770 movies.

The competition was to stay open till a submission was able to beat theNetflix’s own recommender system Cinematch by more than 10% and then allthe competitors had 30 days to submit an algorithm which was the final entry.

The team with the best performing algorithm would get a prize of 1 million USdollars.

COMP4121 27 / 30

Page 147: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

So we presented two kinds of recommender systems:

the Neighbourhood Method (in two flavours, one based on thesimilarity of users and another based on similarity of movies)the Latent Factor Method which can be deployed with differentnumber N of “latent factors” (in applications usually between 20and 200)

So how do we decide which one we should use in a particular application?

How can we evaluate how effective a particular choice of a recommendersystem is?

Idea: We use real existing data. As an example we use the Netflix Challengecompetition.

Netflix provided approximately 100 million actual ratings of 480, 000 users,rating 17, 770 movies.

The competition was to stay open till a submission was able to beat theNetflix’s own recommender system Cinematch by more than 10% and then allthe competitors had 30 days to submit an algorithm which was the final entry.

The team with the best performing algorithm would get a prize of 1 million USdollars.

COMP4121 27 / 30

Page 148: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

So we presented two kinds of recommender systems:

the Neighbourhood Method (in two flavours, one based on thesimilarity of users and another based on similarity of movies)the Latent Factor Method which can be deployed with differentnumber N of “latent factors” (in applications usually between 20and 200)

So how do we decide which one we should use in a particular application?

How can we evaluate how effective a particular choice of a recommendersystem is?

Idea: We use real existing data. As an example we use the Netflix Challengecompetition.

Netflix provided approximately 100 million actual ratings of 480, 000 users,rating 17, 770 movies.

The competition was to stay open till a submission was able to beat theNetflix’s own recommender system Cinematch by more than 10% and then allthe competitors had 30 days to submit an algorithm which was the final entry.

The team with the best performing algorithm would get a prize of 1 million USdollars.

COMP4121 27 / 30

Page 149: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

So we presented two kinds of recommender systems:

the Neighbourhood Method (in two flavours, one based on thesimilarity of users and another based on similarity of movies)the Latent Factor Method which can be deployed with differentnumber N of “latent factors” (in applications usually between 20and 200)

So how do we decide which one we should use in a particular application?

How can we evaluate how effective a particular choice of a recommendersystem is?

Idea: We use real existing data. As an example we use the Netflix Challengecompetition.

Netflix provided approximately 100 million actual ratings of 480, 000 users,rating 17, 770 movies.

The competition was to stay open till a submission was able to beat theNetflix’s own recommender system Cinematch by more than 10% and then allthe competitors had 30 days to submit an algorithm which was the final entry.

The team with the best performing algorithm would get a prize of 1 million USdollars.

COMP4121 27 / 30

Page 150: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

So we presented two kinds of recommender systems:

the Neighbourhood Method (in two flavours, one based on thesimilarity of users and another based on similarity of movies)the Latent Factor Method which can be deployed with differentnumber N of “latent factors” (in applications usually between 20and 200)

So how do we decide which one we should use in a particular application?

How can we evaluate how effective a particular choice of a recommendersystem is?

Idea: We use real existing data. As an example we use the Netflix Challengecompetition.

Netflix provided approximately 100 million actual ratings of 480, 000 users,rating 17, 770 movies.

The competition was to stay open till a submission was able to beat theNetflix’s own recommender system Cinematch by more than 10% and then allthe competitors had 30 days to submit an algorithm which was the final entry.

The team with the best performing algorithm would get a prize of 1 million USdollars.

COMP4121 27 / 30

Page 151: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

So we presented two kinds of recommender systems:

the Neighbourhood Method (in two flavours, one based on thesimilarity of users and another based on similarity of movies)the Latent Factor Method which can be deployed with differentnumber N of “latent factors” (in applications usually between 20and 200)

So how do we decide which one we should use in a particular application?

How can we evaluate how effective a particular choice of a recommendersystem is?

Idea: We use real existing data. As an example we use the Netflix Challengecompetition.

Netflix provided approximately 100 million actual ratings of 480, 000 users,rating 17, 770 movies.

The competition was to stay open till a submission was able to beat theNetflix’s own recommender system Cinematch by more than 10% and then allthe competitors had 30 days to submit an algorithm which was the final entry.

The team with the best performing algorithm would get a prize of 1 million USdollars.

COMP4121 27 / 30

Page 152: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusionsBut how was the performance of the proposed algorithms measured?

100 million ratings that were made available was to serve as the training dataset R for the algorithms.The test consisted of a set T of 1.4 million ratings which were NOT included inthe 100 million ratings in the training data set R and not available to theteams.However, all the users and all the movies of these 1.4 million ratings wereincluded in the 100 million ratings made available (but with these users ratingdifferent movies in the training data set than the movies they rated in these1.4 million test ratings).The algorithms had to predict these 1.4 million ratings on the basis of such 100million ratings training data set.The accuracy was measured by the RMS (Root Mean Square) error

rms error =

√∑(j,i)∈T (T (j, i)− Pa(j, i))2

1.4× 106

Here T (j, i) are the actual ratings included in the test set T (but not includedin the training set R made available to the competitors)Pa(j, i) were the predictions of algorithm a, made on the basis of the massivetraining data set R which contained other ratings of the users involved in thetest set T as well as ratings of other users of the movies involved in T (as wellas many other ratings by users and of movies not involved in T ).

COMP4121 28 / 30

Page 153: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusionsBut how was the performance of the proposed algorithms measured?100 million ratings that were made available was to serve as the training dataset R for the algorithms.

The test consisted of a set T of 1.4 million ratings which were NOT included inthe 100 million ratings in the training data set R and not available to theteams.However, all the users and all the movies of these 1.4 million ratings wereincluded in the 100 million ratings made available (but with these users ratingdifferent movies in the training data set than the movies they rated in these1.4 million test ratings).The algorithms had to predict these 1.4 million ratings on the basis of such 100million ratings training data set.The accuracy was measured by the RMS (Root Mean Square) error

rms error =

√∑(j,i)∈T (T (j, i)− Pa(j, i))2

1.4× 106

Here T (j, i) are the actual ratings included in the test set T (but not includedin the training set R made available to the competitors)Pa(j, i) were the predictions of algorithm a, made on the basis of the massivetraining data set R which contained other ratings of the users involved in thetest set T as well as ratings of other users of the movies involved in T (as wellas many other ratings by users and of movies not involved in T ).

COMP4121 28 / 30

Page 154: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusionsBut how was the performance of the proposed algorithms measured?100 million ratings that were made available was to serve as the training dataset R for the algorithms.The test consisted of a set T of 1.4 million ratings which were NOT included inthe 100 million ratings in the training data set R and not available to theteams.

However, all the users and all the movies of these 1.4 million ratings wereincluded in the 100 million ratings made available (but with these users ratingdifferent movies in the training data set than the movies they rated in these1.4 million test ratings).The algorithms had to predict these 1.4 million ratings on the basis of such 100million ratings training data set.The accuracy was measured by the RMS (Root Mean Square) error

rms error =

√∑(j,i)∈T (T (j, i)− Pa(j, i))2

1.4× 106

Here T (j, i) are the actual ratings included in the test set T (but not includedin the training set R made available to the competitors)Pa(j, i) were the predictions of algorithm a, made on the basis of the massivetraining data set R which contained other ratings of the users involved in thetest set T as well as ratings of other users of the movies involved in T (as wellas many other ratings by users and of movies not involved in T ).

COMP4121 28 / 30

Page 155: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusionsBut how was the performance of the proposed algorithms measured?100 million ratings that were made available was to serve as the training dataset R for the algorithms.The test consisted of a set T of 1.4 million ratings which were NOT included inthe 100 million ratings in the training data set R and not available to theteams.However, all the users and all the movies of these 1.4 million ratings wereincluded in the 100 million ratings made available (but with these users ratingdifferent movies in the training data set than the movies they rated in these1.4 million test ratings).

The algorithms had to predict these 1.4 million ratings on the basis of such 100million ratings training data set.The accuracy was measured by the RMS (Root Mean Square) error

rms error =

√∑(j,i)∈T (T (j, i)− Pa(j, i))2

1.4× 106

Here T (j, i) are the actual ratings included in the test set T (but not includedin the training set R made available to the competitors)Pa(j, i) were the predictions of algorithm a, made on the basis of the massivetraining data set R which contained other ratings of the users involved in thetest set T as well as ratings of other users of the movies involved in T (as wellas many other ratings by users and of movies not involved in T ).

COMP4121 28 / 30

Page 156: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusionsBut how was the performance of the proposed algorithms measured?100 million ratings that were made available was to serve as the training dataset R for the algorithms.The test consisted of a set T of 1.4 million ratings which were NOT included inthe 100 million ratings in the training data set R and not available to theteams.However, all the users and all the movies of these 1.4 million ratings wereincluded in the 100 million ratings made available (but with these users ratingdifferent movies in the training data set than the movies they rated in these1.4 million test ratings).The algorithms had to predict these 1.4 million ratings on the basis of such 100million ratings training data set.

The accuracy was measured by the RMS (Root Mean Square) error

rms error =

√∑(j,i)∈T (T (j, i)− Pa(j, i))2

1.4× 106

Here T (j, i) are the actual ratings included in the test set T (but not includedin the training set R made available to the competitors)Pa(j, i) were the predictions of algorithm a, made on the basis of the massivetraining data set R which contained other ratings of the users involved in thetest set T as well as ratings of other users of the movies involved in T (as wellas many other ratings by users and of movies not involved in T ).

COMP4121 28 / 30

Page 157: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusionsBut how was the performance of the proposed algorithms measured?100 million ratings that were made available was to serve as the training dataset R for the algorithms.The test consisted of a set T of 1.4 million ratings which were NOT included inthe 100 million ratings in the training data set R and not available to theteams.However, all the users and all the movies of these 1.4 million ratings wereincluded in the 100 million ratings made available (but with these users ratingdifferent movies in the training data set than the movies they rated in these1.4 million test ratings).The algorithms had to predict these 1.4 million ratings on the basis of such 100million ratings training data set.The accuracy was measured by the RMS (Root Mean Square) error

rms error =

√∑(j,i)∈T (T (j, i)− Pa(j, i))2

1.4× 106

Here T (j, i) are the actual ratings included in the test set T (but not includedin the training set R made available to the competitors)Pa(j, i) were the predictions of algorithm a, made on the basis of the massivetraining data set R which contained other ratings of the users involved in thetest set T as well as ratings of other users of the movies involved in T (as wellas many other ratings by users and of movies not involved in T ).

COMP4121 28 / 30

Page 158: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusionsBut how was the performance of the proposed algorithms measured?100 million ratings that were made available was to serve as the training dataset R for the algorithms.The test consisted of a set T of 1.4 million ratings which were NOT included inthe 100 million ratings in the training data set R and not available to theteams.However, all the users and all the movies of these 1.4 million ratings wereincluded in the 100 million ratings made available (but with these users ratingdifferent movies in the training data set than the movies they rated in these1.4 million test ratings).The algorithms had to predict these 1.4 million ratings on the basis of such 100million ratings training data set.The accuracy was measured by the RMS (Root Mean Square) error

rms error =

√∑(j,i)∈T (T (j, i)− Pa(j, i))2

1.4× 106

Here T (j, i) are the actual ratings included in the test set T (but not includedin the training set R made available to the competitors)

Pa(j, i) were the predictions of algorithm a, made on the basis of the massivetraining data set R which contained other ratings of the users involved in thetest set T as well as ratings of other users of the movies involved in T (as wellas many other ratings by users and of movies not involved in T ).

COMP4121 28 / 30

Page 159: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusionsBut how was the performance of the proposed algorithms measured?100 million ratings that were made available was to serve as the training dataset R for the algorithms.The test consisted of a set T of 1.4 million ratings which were NOT included inthe 100 million ratings in the training data set R and not available to theteams.However, all the users and all the movies of these 1.4 million ratings wereincluded in the 100 million ratings made available (but with these users ratingdifferent movies in the training data set than the movies they rated in these1.4 million test ratings).The algorithms had to predict these 1.4 million ratings on the basis of such 100million ratings training data set.The accuracy was measured by the RMS (Root Mean Square) error

rms error =

√∑(j,i)∈T (T (j, i)− Pa(j, i))2

1.4× 106

Here T (j, i) are the actual ratings included in the test set T (but not includedin the training set R made available to the competitors)Pa(j, i) were the predictions of algorithm a, made on the basis of the massivetraining data set R which contained other ratings of the users involved in thetest set T as well as ratings of other users of the movies involved in T (as wellas many other ratings by users and of movies not involved in T ).

COMP4121 28 / 30

Page 160: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

But instead of picking one recommender system over another one, we can alsocombine several recommender systems as follows.

Let Pk(j, i) be the predicted ratings of a recommender system Sk, 1 ≤ k ≤ B,where we have B many recommender systems.We can now form a composite prediction as a weighted average

P ∗(j, i) =∑

1≤k≤B

wkPk(j, i)

where∑

1≤k≤B wk = 1 are positive weight factors.

But how do we determine optimal weights wk, and also optimal values of otherparameters such as the regularisation factors λ and the number N of LatentFactors?The answer is pretty mundane: by an arduous trial and error procedure:If we have a massive training data set as in the case of the Netflix prize, wecan remove quite a few smaller testing subsets Tq of ratings and then use thealgorithm with different values of the parameters to predict these removed testratings.We can then measure the RMS error of the predictions on these test data setsTq with different values of the parameters, trying to tweak the parameters tillwe get as smaller error as possible, but making sure that we do not overfit, byusing reasonably diverse and numerous tests sets Tq.

COMP4121 29 / 30

Page 161: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

But instead of picking one recommender system over another one, we can alsocombine several recommender systems as follows.Let Pk(j, i) be the predicted ratings of a recommender system Sk, 1 ≤ k ≤ B,where we have B many recommender systems.

We can now form a composite prediction as a weighted average

P ∗(j, i) =∑

1≤k≤B

wkPk(j, i)

where∑

1≤k≤B wk = 1 are positive weight factors.

But how do we determine optimal weights wk, and also optimal values of otherparameters such as the regularisation factors λ and the number N of LatentFactors?The answer is pretty mundane: by an arduous trial and error procedure:If we have a massive training data set as in the case of the Netflix prize, wecan remove quite a few smaller testing subsets Tq of ratings and then use thealgorithm with different values of the parameters to predict these removed testratings.We can then measure the RMS error of the predictions on these test data setsTq with different values of the parameters, trying to tweak the parameters tillwe get as smaller error as possible, but making sure that we do not overfit, byusing reasonably diverse and numerous tests sets Tq.

COMP4121 29 / 30

Page 162: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

But instead of picking one recommender system over another one, we can alsocombine several recommender systems as follows.Let Pk(j, i) be the predicted ratings of a recommender system Sk, 1 ≤ k ≤ B,where we have B many recommender systems.We can now form a composite prediction as a weighted average

P ∗(j, i) =∑

1≤k≤B

wkPk(j, i)

where∑

1≤k≤B wk = 1 are positive weight factors.

But how do we determine optimal weights wk, and also optimal values of otherparameters such as the regularisation factors λ and the number N of LatentFactors?The answer is pretty mundane: by an arduous trial and error procedure:If we have a massive training data set as in the case of the Netflix prize, wecan remove quite a few smaller testing subsets Tq of ratings and then use thealgorithm with different values of the parameters to predict these removed testratings.We can then measure the RMS error of the predictions on these test data setsTq with different values of the parameters, trying to tweak the parameters tillwe get as smaller error as possible, but making sure that we do not overfit, byusing reasonably diverse and numerous tests sets Tq.

COMP4121 29 / 30

Page 163: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

But instead of picking one recommender system over another one, we can alsocombine several recommender systems as follows.Let Pk(j, i) be the predicted ratings of a recommender system Sk, 1 ≤ k ≤ B,where we have B many recommender systems.We can now form a composite prediction as a weighted average

P ∗(j, i) =∑

1≤k≤B

wkPk(j, i)

where∑

1≤k≤B wk = 1 are positive weight factors.

But how do we determine optimal weights wk, and also optimal values of otherparameters such as the regularisation factors λ and the number N of LatentFactors?

The answer is pretty mundane: by an arduous trial and error procedure:If we have a massive training data set as in the case of the Netflix prize, wecan remove quite a few smaller testing subsets Tq of ratings and then use thealgorithm with different values of the parameters to predict these removed testratings.We can then measure the RMS error of the predictions on these test data setsTq with different values of the parameters, trying to tweak the parameters tillwe get as smaller error as possible, but making sure that we do not overfit, byusing reasonably diverse and numerous tests sets Tq.

COMP4121 29 / 30

Page 164: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

But instead of picking one recommender system over another one, we can alsocombine several recommender systems as follows.Let Pk(j, i) be the predicted ratings of a recommender system Sk, 1 ≤ k ≤ B,where we have B many recommender systems.We can now form a composite prediction as a weighted average

P ∗(j, i) =∑

1≤k≤B

wkPk(j, i)

where∑

1≤k≤B wk = 1 are positive weight factors.

But how do we determine optimal weights wk, and also optimal values of otherparameters such as the regularisation factors λ and the number N of LatentFactors?The answer is pretty mundane: by an arduous trial and error procedure:

If we have a massive training data set as in the case of the Netflix prize, wecan remove quite a few smaller testing subsets Tq of ratings and then use thealgorithm with different values of the parameters to predict these removed testratings.We can then measure the RMS error of the predictions on these test data setsTq with different values of the parameters, trying to tweak the parameters tillwe get as smaller error as possible, but making sure that we do not overfit, byusing reasonably diverse and numerous tests sets Tq.

COMP4121 29 / 30

Page 165: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

But instead of picking one recommender system over another one, we can alsocombine several recommender systems as follows.Let Pk(j, i) be the predicted ratings of a recommender system Sk, 1 ≤ k ≤ B,where we have B many recommender systems.We can now form a composite prediction as a weighted average

P ∗(j, i) =∑

1≤k≤B

wkPk(j, i)

where∑

1≤k≤B wk = 1 are positive weight factors.

But how do we determine optimal weights wk, and also optimal values of otherparameters such as the regularisation factors λ and the number N of LatentFactors?The answer is pretty mundane: by an arduous trial and error procedure:If we have a massive training data set as in the case of the Netflix prize, wecan remove quite a few smaller testing subsets Tq of ratings and then use thealgorithm with different values of the parameters to predict these removed testratings.

We can then measure the RMS error of the predictions on these test data setsTq with different values of the parameters, trying to tweak the parameters tillwe get as smaller error as possible, but making sure that we do not overfit, byusing reasonably diverse and numerous tests sets Tq.

COMP4121 29 / 30

Page 166: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

But instead of picking one recommender system over another one, we can alsocombine several recommender systems as follows.Let Pk(j, i) be the predicted ratings of a recommender system Sk, 1 ≤ k ≤ B,where we have B many recommender systems.We can now form a composite prediction as a weighted average

P ∗(j, i) =∑

1≤k≤B

wkPk(j, i)

where∑

1≤k≤B wk = 1 are positive weight factors.

But how do we determine optimal weights wk, and also optimal values of otherparameters such as the regularisation factors λ and the number N of LatentFactors?The answer is pretty mundane: by an arduous trial and error procedure:If we have a massive training data set as in the case of the Netflix prize, wecan remove quite a few smaller testing subsets Tq of ratings and then use thealgorithm with different values of the parameters to predict these removed testratings.We can then measure the RMS error of the predictions on these test data setsTq with different values of the parameters, trying to tweak the parameters tillwe get as smaller error as possible, but making sure that we do not overfit, byusing reasonably diverse and numerous tests sets Tq.

COMP4121 29 / 30

Page 167: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

In fact, the best performing algorithms at the Netflix competition werecombinations of dozens of components with empirically tuned parameters.

Further improvements in performance can be achieved by giving lower weightsto older ratings of movies, thus also introducing the temporal dimension.

Conclusion:

The Recommender Systems, just as the Google PageRank algorithm, exemplifya design paradigm:

The ingredient “baseline” algorithms have a sound basis employingincreasingly sophisticated mathematical concepts and theorems.

However, the final product is an empirically obtained “tweak” of suchcomponent algorithms.

Unlike Physics, Computer Science cannot seek “definitive”, exact methods andtheories, especially for applications which involve subjective human factorssuch as taste or human opinion.

We look for good approximations of complex and “noisy” reality, obtained frommathematically based components through empirical testing and tweaking.

In most of engineering fields the only real criterion of the success of a newdesign is the commercial impact of such a design!

COMP4121 30 / 30

Page 168: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

In fact, the best performing algorithms at the Netflix competition werecombinations of dozens of components with empirically tuned parameters.

Further improvements in performance can be achieved by giving lower weightsto older ratings of movies, thus also introducing the temporal dimension.

Conclusion:

The Recommender Systems, just as the Google PageRank algorithm, exemplifya design paradigm:

The ingredient “baseline” algorithms have a sound basis employingincreasingly sophisticated mathematical concepts and theorems.

However, the final product is an empirically obtained “tweak” of suchcomponent algorithms.

Unlike Physics, Computer Science cannot seek “definitive”, exact methods andtheories, especially for applications which involve subjective human factorssuch as taste or human opinion.

We look for good approximations of complex and “noisy” reality, obtained frommathematically based components through empirical testing and tweaking.

In most of engineering fields the only real criterion of the success of a newdesign is the commercial impact of such a design!

COMP4121 30 / 30

Page 169: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

In fact, the best performing algorithms at the Netflix competition werecombinations of dozens of components with empirically tuned parameters.

Further improvements in performance can be achieved by giving lower weightsto older ratings of movies, thus also introducing the temporal dimension.

Conclusion:

The Recommender Systems, just as the Google PageRank algorithm, exemplifya design paradigm:

The ingredient “baseline” algorithms have a sound basis employingincreasingly sophisticated mathematical concepts and theorems.

However, the final product is an empirically obtained “tweak” of suchcomponent algorithms.

Unlike Physics, Computer Science cannot seek “definitive”, exact methods andtheories, especially for applications which involve subjective human factorssuch as taste or human opinion.

We look for good approximations of complex and “noisy” reality, obtained frommathematically based components through empirical testing and tweaking.

In most of engineering fields the only real criterion of the success of a newdesign is the commercial impact of such a design!

COMP4121 30 / 30

Page 170: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

In fact, the best performing algorithms at the Netflix competition werecombinations of dozens of components with empirically tuned parameters.

Further improvements in performance can be achieved by giving lower weightsto older ratings of movies, thus also introducing the temporal dimension.

Conclusion:

The Recommender Systems, just as the Google PageRank algorithm, exemplifya design paradigm:

The ingredient “baseline” algorithms have a sound basis employingincreasingly sophisticated mathematical concepts and theorems.

However, the final product is an empirically obtained “tweak” of suchcomponent algorithms.

Unlike Physics, Computer Science cannot seek “definitive”, exact methods andtheories, especially for applications which involve subjective human factorssuch as taste or human opinion.

We look for good approximations of complex and “noisy” reality, obtained frommathematically based components through empirical testing and tweaking.

In most of engineering fields the only real criterion of the success of a newdesign is the commercial impact of such a design!

COMP4121 30 / 30

Page 171: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

In fact, the best performing algorithms at the Netflix competition werecombinations of dozens of components with empirically tuned parameters.

Further improvements in performance can be achieved by giving lower weightsto older ratings of movies, thus also introducing the temporal dimension.

Conclusion:

The Recommender Systems, just as the Google PageRank algorithm, exemplifya design paradigm:

The ingredient “baseline” algorithms have a sound basis employingincreasingly sophisticated mathematical concepts and theorems.

However, the final product is an empirically obtained “tweak” of suchcomponent algorithms.

Unlike Physics, Computer Science cannot seek “definitive”, exact methods andtheories, especially for applications which involve subjective human factorssuch as taste or human opinion.

We look for good approximations of complex and “noisy” reality, obtained frommathematically based components through empirical testing and tweaking.

In most of engineering fields the only real criterion of the success of a newdesign is the commercial impact of such a design!

COMP4121 30 / 30

Page 172: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

In fact, the best performing algorithms at the Netflix competition werecombinations of dozens of components with empirically tuned parameters.

Further improvements in performance can be achieved by giving lower weightsto older ratings of movies, thus also introducing the temporal dimension.

Conclusion:

The Recommender Systems, just as the Google PageRank algorithm, exemplifya design paradigm:

The ingredient “baseline” algorithms have a sound basis employingincreasingly sophisticated mathematical concepts and theorems.

However, the final product is an empirically obtained “tweak” of suchcomponent algorithms.

Unlike Physics, Computer Science cannot seek “definitive”, exact methods andtheories, especially for applications which involve subjective human factorssuch as taste or human opinion.

We look for good approximations of complex and “noisy” reality, obtained frommathematically based components through empirical testing and tweaking.

In most of engineering fields the only real criterion of the success of a newdesign is the commercial impact of such a design!

COMP4121 30 / 30

Page 173: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

In fact, the best performing algorithms at the Netflix competition werecombinations of dozens of components with empirically tuned parameters.

Further improvements in performance can be achieved by giving lower weightsto older ratings of movies, thus also introducing the temporal dimension.

Conclusion:

The Recommender Systems, just as the Google PageRank algorithm, exemplifya design paradigm:

The ingredient “baseline” algorithms have a sound basis employingincreasingly sophisticated mathematical concepts and theorems.

However, the final product is an empirically obtained “tweak” of suchcomponent algorithms.

Unlike Physics, Computer Science cannot seek “definitive”, exact methods andtheories, especially for applications which involve subjective human factorssuch as taste or human opinion.

We look for good approximations of complex and “noisy” reality, obtained frommathematically based components through empirical testing and tweaking.

In most of engineering fields the only real criterion of the success of a newdesign is the commercial impact of such a design!

COMP4121 30 / 30

Page 174: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

In fact, the best performing algorithms at the Netflix competition werecombinations of dozens of components with empirically tuned parameters.

Further improvements in performance can be achieved by giving lower weightsto older ratings of movies, thus also introducing the temporal dimension.

Conclusion:

The Recommender Systems, just as the Google PageRank algorithm, exemplifya design paradigm:

The ingredient “baseline” algorithms have a sound basis employingincreasingly sophisticated mathematical concepts and theorems.

However, the final product is an empirically obtained “tweak” of suchcomponent algorithms.

Unlike Physics, Computer Science cannot seek “definitive”, exact methods andtheories, especially for applications which involve subjective human factorssuch as taste or human opinion.

We look for good approximations of complex and “noisy” reality, obtained frommathematically based components through empirical testing and tweaking.

In most of engineering fields the only real criterion of the success of a newdesign is the commercial impact of such a design!

COMP4121 30 / 30

Page 175: COMP4121 Advanced Algorithmscs4121/lectures_2019/recommender... · 2019-10-28 · Recommender Systems Content based recommender systems su er a serious problem: classi cation according

Recommender Systems - conclusions

In fact, the best performing algorithms at the Netflix competition werecombinations of dozens of components with empirically tuned parameters.

Further improvements in performance can be achieved by giving lower weightsto older ratings of movies, thus also introducing the temporal dimension.

Conclusion:

The Recommender Systems, just as the Google PageRank algorithm, exemplifya design paradigm:

The ingredient “baseline” algorithms have a sound basis employingincreasingly sophisticated mathematical concepts and theorems.

However, the final product is an empirically obtained “tweak” of suchcomponent algorithms.

Unlike Physics, Computer Science cannot seek “definitive”, exact methods andtheories, especially for applications which involve subjective human factorssuch as taste or human opinion.

We look for good approximations of complex and “noisy” reality, obtained frommathematically based components through empirical testing and tweaking.

In most of engineering fields the only real criterion of the success of a newdesign is the commercial impact of such a design!

COMP4121 30 / 30