29
Events, Signals & Recommendations Francis Crimmins, Lucidworks March 25, 2015

Events, Signals, and Recommendations

Embed Size (px)

Citation preview

Events, Signals & Recommendations Francis Crimmins, Lucidworks

March 25, 2015

Search Results Without Click Signals

Search Results With Click Signals

Sample Search Result Click Events

Single User Events

Multiple User Events

Blue = Locals, Red = Tourists

•Logs

•Clickstream

•Call Detail Record (CDR)

•Financial Transactions

•Sensor Data

Example Event Data

•Denormalized (a lot of redundancy)

•Nested

•Schema-less

•Historical (immutable i.e. will never change)

•Isolated (can often be processed in parallel)

Characteristics of Event Data

• A time series is a sequence of data points.

• The points are typically measured at successive points in

time spaced at uniform time intervals.

• A time series has a natural temporal ordering.

Time Series Data

2014 South Napa Earthquake: Seismic “View”

2014 South Napa Earthquake: Twitter “View”

• A stochastic model for a time series will generally reflect the fact that

observations close together in time will be more closely related than

observations further apart.

• This is one of the reasons to group some event data into sessions.

• A session is a time window for all events generated by a given source

(like a unique user ID).

Stochastic Model

• If entities frequently co-occur, we may infer an interdependence

between those entities.

• For example, a click event can be described using a click(user, query,

docId) tuple.

• We can then associate each of those entities with each other and with

other similar events within a session.

Co-Occurrence (Adjacency) Matrix

• NB: We generate the co-occurrence relations not just between the

same field types e.g. (query, query) pairs, but also "cross-field"

relations e.g. (query, docId), (docId, user) pairs etc.

• This will give us an N x N adjacency matrix, where N = all unique

instances of the field types that we want to calculate co-occurrence

relations for.

Co-Occurrence (Adjacency) Matrix

• The co-occurrence matrix can be used to generate a weighted

directed graph.

• Here each node can be of any of the types we are interested in (users,

queries, sessions, documents etc.).

• The weights for the edges between the nodes in the graph are the

frequency of co-occurrence i.e. how many times the source and target

nodes have occurred together.

Weighted Directed Graph

• For a given item, we may only have a small number of items that co-

occur with it (based on the co-occurrence matrix).

• By adding in the data from navigating the graph (which comes from

the matrix), we increase the diversity of suggestions.

• Items that appear in multiple source lists then rise to the top.

• This helps improve the quality of the recommendations & reduce bias.

Graph Navigation

Aggregated Record (Sample Query)

Recommendations (For a Query)

Graph Navigation - Example Query

Graph Navigation - Example Query

Graph Navigation - Example Query

Graph Navigation - Example Query

Graph Navigation - Example Query

Reminder: Search Results With Click Signals

Recommendations from “Top Clicks”

Recommendations from “Neighbourhood Graph”

Using Signals

=

Modifying Your Behaviour in Response to your Environment

Events & Signals